Skip to content

[Bug]: Does not run gpt-oss:20b model on GPU in Ollama #19799

@IgnatovD

Description

@IgnatovD

Bug Description

The gpt-oss:20b model does not run on GPU in Olama server. Works only on cpu.

Version

llama-index==0.13.3 llama-index-llms-ollama==0.7.1

Steps to Reproduce

from llama_index.llms.ollama import Ollama

llm = Ollama(
    model="gpt-oss:20b",
    base_url=<ip:port>,
    request_timeout=360,
    temperature=0.8,
    thinking=True
)

llm.complete('Hello!')

At this time I am monitoring the load on the video card.

Fri Sep  5 15:09:16 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100S-PCIE-32GB          Off |   00000000:95:00.0 Off |                    0 |
| N/A   48C    P0             43W /  250W |     310MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:CB:00.0 Off |                  N/A |
|  0%   35C    P8             16W /  350W |       4MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    350067      C   /usr/bin/ollama                               306MiB |
+-----------------------------------------------------------------------------------------+

If you make a direct request via curl, GPU is loaded.

curl http://<ip:port>/api/generate -d '{
  "model": "gpt-oss:20b",
  "prompt": "Hello!",
  "options": {
    "temperature": 0.8,
    "thinking": true
  }
}'

Fri Sep  5 15:15:17 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100S-PCIE-32GB          Off |   00000000:95:00.0 Off |                    0 |
| N/A   50C    P0             44W /  250W |   14882MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:CB:00.0 Off |                  N/A |
|  0%   35C    P8             16W /  350W |       4MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    384635      C   /usr/bin/ollama                             14878MiB |
+-----------------------------------------------------------------------------------------+

Relevant Logs/Tracbacks

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions