Now that we have our Arc discrete GPU setup on Linux, let’s try to run Stable Diffusion model using it.
A quick recap / updated steps to set up Arc on Linux
Intel has now published documentation on how to set up Arc on Linux. I tried it today, it worked beautifully.
Steps to configure Arc
- Install the 5.7 OEM kernel
- Install kernel mode drivers, gpu firmware
- Install usermod drivers for compute, 3d graphics and media
- Add user to
render
group - Install oneAPI 2022.3 (latest as of this writeup)
Stable Diffusion
Stable Diffusion is a fully open-source (thank you Stability.ai) deep learning text to image and image to image model. For more information on the model, checkout the wikipedia entry for the same.
PyTorch
To use PyTorch on Intel GPUs, we need to install, the Intel extensions for PyTorch or ipex. Let’s get the latest release for pyTorch and ipex.
- Create a conda environment with Python 3.9 and install both of the wheels.
~ → conda create -n ipex python=3.9 -y
~ → conda activate ipex
~ → pip install ~/Downloads/*.whl
Let’s see how to run the model using PyTorch first,
- Install diffusers library and dependencies
~ → pip install diffusers ftfy transformers Pillow
- Run stable diffusion
We will use a model from 🤗 maintained by runwayml, runwayml/stable-diffusion-v1-5
. To use the model, you will have to generate a User access token for the 🤗 model hub. Once generated we can easily download the model using diffusers API. Now that we have installed all the required packages and have the user token, lets try it out:
import intel_extension_for_pytorch
import torch
from diffusers import StableDiffusionPipeline
="runwayml/stable-diffusion-v1-5"
model_id= "vivid red hot air ballons over paris in the evening"
prompt = StableDiffusionPipeline.from_pretrained(
pipe
model_id,=torch.float16, # this can be torch.float32 as well
torch_dtype="fp16",
revision="<the token you generated>")
use_auth_token= pipe.to("xpu")
pipe = pipe(prompt).images[0]
image f"{prompt[:5]}.png") image.save(
Executing this, we get the result:
8]: image = pipe(prompt).images[0]
In [
...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:35<00:00, 1.43it/s]
9]: image = pipe(prompt).images[0]
In [100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:09<00:00, 5.20it/s]
As you can see the first time you run the model, it takes about 35 seconds, subsequent runs take about 10 seconds, you can expect this number to double when using fp32.
TensorFlow
Moving on to TensorFlow, we have this awesome repo from divamgupta
- Install stable_diffusion_tensorflow package and dependencies
~ → pip install git+https://github.com/divamgupta/stable-diffusion-tensorflow ftfy pillow tqdm regex tensorflow-addons
- Run stable diffusion
Running the TensorFlow model is straightforward as there are no user tokens or anything like that required.
import intel_extension_for_tensorflow
import tensorflow
from stable_diffusion_tf.stable_diffusion import StableDiffusion
from PIL import Image
= "vivid red hot air ballons over paris in the evening"
prompt = StableDiffusion(
generator =512,
img_height=512,
img_width=False,
jit_compile
)
= generator.generate(
img
prompt,=50,
num_steps=7.5,
unconditional_guidance_scale=1,
temperature=1,
batch_size
)0]).save("sd_tf_fp32.png") Image.fromarray(img[
Executing this, we get the result:
2022-11-06 23:00:51.948547: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
0 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:00<00:00, 1.21s/it]
2022-11-06 23:01:55.103111: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
0 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:29<00:00, 1.67it/s]
As you can see the first time you run the model, it takes about 60 seconds, subsequent runs take about 30 seconds. One thing to note here is that, for the TensorFlow version we used FP32 and not FP16 as in the case of pyTorch.
Repo
You can find the full code and other related materials here.