Ollama with Intel GPU acceleration on Proxmox

Run Ollama with Intel GPU acceleration on Proxmox by exposing the iGPU to a Linux container/VM and enabling Intel’s oneAPI/SYCL or OpenVINO backends for model offloading to the Xe iGPU. Below is a clean, step‑by‑step path using a privileged Ubuntu LXC with GPU device mapping, plus an optional OpenVINO backend route if higher throughput is desired on Intel hardware.

What you’ll build

Step 1: Prep Proxmox host (IOMMU + i915 GuC)
sudo nano /etc/default/grub
# Ensure this includes:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Then apply and reboot:

sudo update-grub
sudo reboot

Validate after reboot:

dmesg | grep -e DMAR -e IOMMU

(You should see IOMMU/DMAR enabled messages.)[^1][^4]

echo "options i915 enable_guc=3" | sudo tee /etc/modprobe.d/i915.conf
sudo update-initramfs -u -k all
sudo reboot

After reboot, confirm the render node exists:

ls -l /dev/dri
# Expect renderD128 at minimum
Step 2: Create a privileged Ubuntu LXC and map the GPU
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file

Start the container and verify /dev/dri/renderD128 exists inside it.

Step 3: Install Intel runtimes and Ollama (inside the LXC)
sudo apt update
sudo apt install -y python3.11-venv
python3.11 -m venv ~/llm_env
source ~/llm_env/bin/activate
pip install --pre --upgrade ipex-llm[cpp]
mkdir -p ~/llama-cpp && cd ~/llama-cpp
# Initialize Ollama with Intel GPU support via IPEX-LLM helper
init-ollama

Intel’s guide uses a Python venv, installs ipex-llm[cpp], then initializes an Ollama build configured for Intel GPUs.

Step 4: Run Ollama accelerated on the Intel iGPU
# From inside the LXC (activate venv if needed)
export OLLAMA_NUM_GPU=999
export no_proxy=localhost,127.0.0.1
export ZES_ENABLE_SYSMAN=1
export SYCL_CACHE_PERSISTENT=1
# If oneAPI setvars is installed/system-wide, source it:
source /opt/intel/oneapi/setvars.sh || true
# For certain kernels/GPUs this can help:
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
# Optional: listen on all interfaces if exposing to LAN
# export OLLAMA_HOST=0.0.0.0
ollama serve

OLLAMA_NUM_GPU=999 forces all layers that can run on the GPU to offload, while the Level Zero variables improve device telemetry and command submission for Intel GPUs.

ollama run llama3.1:8b

The IPEX‑LLM integration steers the underlying llama.cpp execution toward Intel’s SYCL/Level Zero path when possible.

Option B: OpenVINO backend for Ollama (higher throughput path)

1) Download and initialize the OpenVINO GenAI runtime; set GODEBUG=cgocheck=0 for the Ollama executable using this backend. 2) Obtain an OpenVINO IR model (e.g., quantized DeepSeek‑R1‑Distill‑Qwen‑7B int4), then package it. 3) Write a Modelfile declaring ModelType “OpenVINO” and InferDevice “GPU,” and create the Ollama model image.

# 1) Prepare OpenVINO GenAI runtime (env example)
export GODEBUG=cgocheck=0
# Source the OpenVINO/GenAI setup if provided by your runtime package
# source setupvars.sh

# 2) Download an OpenVINO IR model (example uses ModelScope tools)
pip install modelscope
modelscope download --model zhaohb/DeepSeek-R1-Distill-Qwen-7B-int4-ov --local_dir ./DeepSeek-R1-Distill-Qwen-7B-int4-ov
tar -zcvf DeepSeek-R1-Distill-Qwen-7B-int4-ov.tar.gz DeepSeek-R1-Distill-Qwen-7B-int4-ov

# 3) Modelfile (OpenVINO backend)
cat > Modelfile << 'EOF'
FROM DeepSeek-R1-Distill-Qwen-7B-int4-ov.tar.gz
ModelType "OpenVINO"
InferDevice "GPU"
PARAMETER stop ""
PARAMETER stop "```
PARAMETER stop "</User|>"
PARAMETER stop "<|end_of_sentence|>"
PARAMETER stop "</|"
PARAMETER max_new_token 4096
PARAMETER stop_id 151643
PARAMETER stop_id 151647
PARAMETER repeat_penalty 1.5
PARAMETER top_p 0.95
PARAMETER top_k 50
PARAMETER temperature 0.8
EOF

# Create and run the model with the OpenVINO backend
ollama create DeepSeek-R1-Distill-Qwen-7B-int4-ov:v1 -f Modelfile
ollama run DeepSeek-R1-Distill-Qwen-7B-int4-ov:v1

Troubleshooting and tips

Why this layout

Ins0mniA


Revision #1
Created 2025-08-30 15:02:57 EEST by Green
Updated 2025-09-04 02:07:44 EEST by Green