add the codes to check AMD Instinct GPU number #22367

zhangnju · 2025-08-06T15:10:52Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

LMcache PD example needs at leaest 2 GPUs, but the existed codes are using "nvidia-smi", whcih doesn't work on AMD GPU. so add the codes to check AMD Instinct GPU number

Test Plan

run the shell scripts of check_num_gpus on both AMD and NV platforms

Test Result

the check_num_gpus function could retrun the right number of AMD/NV GPUs

(Optional) Documentation Update

github-actions · 2025-08-06T15:11:00Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds support for detecting AMD Instinct GPUs to ensure the environment has at least two GPUs. The implementation has been updated to check for AMD GPUs by looking for a ROCm installation and using lspci. My feedback focuses on making this detection more robust by using the standard rocm-smi utility instead of relying on installation paths and parsing lspci output.

gemini-code-assist · 2025-08-06T15:12:11Z

examples/others/lmcache/disagg_prefill_lmcache_v1/disagg_example_nixl.sh

+    if [ ! -d "/opt/rocm" ]; then
+	num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
+    else
+	num_gpus=$(lspci | grep Instinct | wc -l)
+    fi


The current method for detecting AMD GPUs has a few potential issues that could make it unreliable:

Dependency on /opt/rocm: It assumes ROCm is always installed in /opt/rocm. This might not be true for all installations or distributions.

Fragile lspci parsing: Relying on lspci | grep Instinct is brittle. The output format of lspci can change, and the string "Instinct" might not be present for all AMD data center GPUs, or it could accidentally match other devices.

A more robust approach is to use rocm-smi, which is the AMD equivalent of nvidia-smi. We can check for the availability of nvidia-smi or rocm-smi and then use the appropriate command to count GPUs. This avoids hardcoded paths and fragile text parsing.

Suggested change

if [ ! -d "/opt/rocm" ]; then

num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)

else

num_gpus=$(lspci | grep Instinct | wc -l)

fi

if command -v nvidia-smi &> /dev/null; then

num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)

elif command -v rocm-smi &> /dev/null; then

# rocm-smi --showid lists the GPU IDs, one per line.

num_gpus=$(rocm-smi --showid | wc -l)

else

echo "Error: Neither nvidia-smi nor rocm-smi found. Cannot determine GPU count."

exit 1

fi

Signed-off-by: Zhang Jason <[email protected]>

DarkLight1337

LGTM, thanks

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: avtc <[email protected]>

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Noam Gat <[email protected]>

Signed-off-by: Zhang Jason <[email protected]>

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: Zhang Jason <[email protected]>

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Zhang Jason <[email protected]>

add the codes to check AMD Instinct GPU number

8888753

mergify bot added documentation Improvements or additions to documentation rocm Related to AMD ROCm labels Aug 6, 2025

gemini-code-assist bot reviewed Aug 6, 2025

View reviewed changes

zhangnju added 2 commits August 6, 2025 23:16

Update disagg_example_nixl.sh

edd3d5a

Signed-off-by: Zhang Jason <[email protected]>

Update disagg_example_nixl.sh

11fb25e

Signed-off-by: Zhang Jason <[email protected]>

DarkLight1337 approved these changes Aug 6, 2025

View reviewed changes

vllm-bot merged commit b4b9813 into vllm-project:main Aug 6, 2025
4 of 8 checks passed

avtc pushed a commit to avtc/vllm that referenced this pull request Aug 6, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

7fa3d69

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: avtc <[email protected]>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

8b14e38

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

7c93fee

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Noam Gat <[email protected]>

wuhang2014 pushed a commit to wuhang2014/vllm that referenced this pull request Aug 12, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

7fc4e9a

Signed-off-by: Zhang Jason <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

6a55383

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

18abf3b

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

df1ecce

Signed-off-by: Zhang Jason <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

ff3a8ff

Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

add the codes to check AMD Instinct GPU number (vllm-project#22367)

090779c

Signed-off-by: Zhang Jason <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add the codes to check AMD Instinct GPU number #22367

add the codes to check AMD Instinct GPU number #22367

Uh oh!

zhangnju commented Aug 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 6, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Uh oh!

-    if [ ! -d "/opt/rocm" ]; then
-	num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
-    else
-	num_gpus=$(lspci | grep Instinct | wc -l)
-    fi
+if command -v nvidia-smi &> /dev/null; then
+    num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
+elif command -v rocm-smi &> /dev/null; then
+    # rocm-smi --showid lists the GPU IDs, one per line.
+    num_gpus=$(rocm-smi --showid | wc -l)
+else
+    echo "Error: Neither nvidia-smi nor rocm-smi found. Cannot determine GPU count."
+    exit 1
+fi

Uh oh!

add the codes to check AMD Instinct GPU number #22367

add the codes to check AMD Instinct GPU number #22367

Uh oh!

Conversation

zhangnju commented Aug 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhangnju commented Aug 6, 2025 •

edited by github-actions bot

Loading