-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Open
Labels
bugSomething isn't workingSomething isn't workingtriageIssue needs to be triaged/prioritizedIssue needs to be triaged/prioritized
Description
Bug Description
The gpt-oss:20b model does not run on GPU in Olama server. Works only on cpu.
Version
llama-index==0.13.3 llama-index-llms-ollama==0.7.1
Steps to Reproduce
from llama_index.llms.ollama import Ollama
llm = Ollama(
model="gpt-oss:20b",
base_url=<ip:port>,
request_timeout=360,
temperature=0.8,
thinking=True
)
llm.complete('Hello!')
At this time I am monitoring the load on the video card.
Fri Sep 5 15:09:16 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100S-PCIE-32GB Off | 00000000:95:00.0 Off | 0 |
| N/A 48C P0 43W / 250W | 310MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:CB:00.0 Off | N/A |
| 0% 35C P8 16W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 350067 C /usr/bin/ollama 306MiB |
+-----------------------------------------------------------------------------------------+
If you make a direct request via curl, GPU is loaded.
curl http://<ip:port>/api/generate -d '{
"model": "gpt-oss:20b",
"prompt": "Hello!",
"options": {
"temperature": 0.8,
"thinking": true
}
}'
Fri Sep 5 15:15:17 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100S-PCIE-32GB Off | 00000000:95:00.0 Off | 0 |
| N/A 50C P0 44W / 250W | 14882MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:CB:00.0 Off | N/A |
| 0% 35C P8 16W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 384635 C /usr/bin/ollama 14878MiB |
+-----------------------------------------------------------------------------------------+
Relevant Logs/Tracbacks
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingtriageIssue needs to be triaged/prioritizedIssue needs to be triaged/prioritized