Update release pipeline post PyTorch 2.8.0 update #24073

youkaichao · 2025-09-02T03:43:37Z

Purpose

Redo #23960 in an upstream branch so that we can trigger build with custom env vars.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Huy Do <[email protected]>

gemini-code-assist

Code Review

This pull request updates the release pipeline to use CUDA 12.9, which is necessary for the PyTorch 2.8.0 update. The changes correctly update the CUDA versions and corresponding build identifiers in the Buildkite pipeline configuration. The wheel upload script has also been modified to handle CUDA 12.9 as the new default version for publishing wheel indices. My review includes suggestions to improve the robustness of the wheel upload script to make it more maintainable and prevent potential issues in future releases.

gemini-code-assist · 2025-09-02T03:44:59Z

.buildkite/scripts/upload-wheels.sh

+if [[ $normal_wheel == *"cu126"* ]]; then
    # if $normal_wheel matches cu126, do not upload the index.html
    echo "Skipping index files for cu126 wheels"
+elif [[ $normal_wheel == *"cu128"* ]]; then
+    # if $normal_wheel matches cu128, do not upload the index.html
+    echo "Skipping index files for cu128 wheels"
 else
-    # only upload index.html for cu128 wheels (default wheels)
+    # only upload index.html for cu129 wheels (default wheels) as it
+    # is available on both x86 and arm64
    aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html"
    aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html"
 fi


The current if/elif/else structure for determining whether to upload index files is fragile. It assumes that any wheel not matching cu126 or cu128 is the default cu129. This could lead to incorrect behavior if other wheel versions are added to the build pipeline in the future.

Using a case statement would make this logic more explicit and robust. It clearly defines the action for each known wheel type and can provide a warning for unexpected versions, preventing silent failures in the release process.

Suggested change

if [[ $normal_wheel == *"cu126"* ]]; then

# if $normal_wheel matches cu126, do not upload the index.html

echo "Skipping index files for cu126 wheels"

elif [[ $normal_wheel == *"cu128"* ]]; then

# if $normal_wheel matches cu128, do not upload the index.html

echo "Skipping index files for cu128 wheels"

else

# only upload index.html for cu128 wheels (default wheels)

# only upload index.html for cu129 wheels (default wheels) as it

# is available on both x86 and arm64

aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html"

aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html"

fi

case "$normal_wheel" in

*cu129*)

# only upload index.html for cu129 wheels (default wheels) as it

# is available on both x86 and arm64

aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html"

aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html"

;;

*cu126*|*cu128*)

# if $normal_wheel matches cu126 or cu128, do not upload the index.html

echo "Skipping index files for $normal_wheel wheels"

;;

*)

echo "WARNING: Unhandled wheel version for index upload: $normal_wheel. Not uploading index."

;;

esac

gemini-code-assist · 2025-09-02T03:44:59Z

.buildkite/scripts/upload-wheels.sh

+if [[ $normal_wheel == *"cu126"* ]]; then
    # if $normal_wheel matches cu126, do not upload the index.html
    echo "Skipping index files for cu126 wheels"
+elif [[ $normal_wheel == *"cu128"* ]]; then
+    # if $normal_wheel matches cu128, do not upload the index.html
+    echo "Skipping index files for cu128 wheels"
 else
-    # only upload index.html for cu128 wheels (default wheels)
+    # only upload index.html for cu129 wheels (default wheels) as it
+    # is available on both x86 and arm64
    aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html"
 fi


Similar to the block above, this if/elif/else structure is fragile. It implicitly assumes any wheel not matching cu126 or cu128 is the default cu129 for which the nightly index should be updated.

Refactoring to a case statement will make the logic more robust and maintainable, explicitly handling known cases and warning about unknown ones. This is important for the stability of the release pipeline.

Suggested change

if [[ $normal_wheel == *"cu126"* ]]; then

# if $normal_wheel matches cu126, do not upload the index.html

echo "Skipping index files for cu126 wheels"

elif [[ $normal_wheel == *"cu128"* ]]; then

# if $normal_wheel matches cu128, do not upload the index.html

echo "Skipping index files for cu128 wheels"

else

# only upload index.html for cu128 wheels (default wheels)

# only upload index.html for cu129 wheels (default wheels) as it

# is available on both x86 and arm64

aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html"

fi

case "$normal_wheel" in

*cu129*)

# only upload index.html for cu129 wheels (default wheels) as it

# is available on both x86 and arm64

aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html"

;;

*cu126*|*cu128*)

# if $normal_wheel matches cu126 or cu128, do not upload the index.html

echo "Skipping index files for $normal_wheel wheels"

;;

*)

echo "WARNING: Unhandled wheel version for nightly index upload: $normal_wheel. Not uploading index."

;;

esac

Signed-off-by: youkaichao <[email protected]>

nWEIdia · 2025-09-02T16:06:35Z

I can see this PR pushed a new aarch64 image from the 2nd last commit of this branch, great! (See: https://gallery.ecr.aws/q9t5s3a7/vllm-release-repo 43a6f63d0860ef2f545122332d7a377e99010838-aarch64)
Do you know why the top commit did not generate a new aarch64 build?

youkaichao · 2025-09-02T16:24:53Z

Do you know why the top commit did not generate a new aarch64 build?

because I don't trigger release build for the latest commit.

nWEIdia

Looks great to me, thanks!

huydhn

Stamped! Retrying on #23960 seems to work for me too to avoid timing out in the release build https://buildkite.com/vllm/release/builds/7828. So both PR(s) are fine I think

youkaichao · 2025-09-03T02:10:18Z

Retrying on #23960 seems to work for me too to avoid timing out in the release build

yeah that's because I manually triggered a 10 hour build, now the compilation cache is populated, and later release build can be much faster.

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]> Signed-off-by: Shiyan Deng <[email protected]>

Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]>

Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>

huydhn and others added 8 commits August 29, 2025 13:15

Update release pipeline post PyTorch 2.8.0 update

7db334d

Signed-off-by: Huy Do <[email protected]>

Address review comments

f106b84

Signed-off-by: Huy Do <[email protected]>

Another tweak to build deepgemm

646428f

Signed-off-by: Huy Do <[email protected]>

Is this working now?

87a4a5c

Signed-off-by: Huy Do <[email protected]>

Build CUDA aarch64 on 12.9

7b7f903

Signed-off-by: Huy Do <[email protected]>

Update cu129 wheel to nightly

8988fc1

Signed-off-by: Huy Do <[email protected]>

Revert libnuma change to see if it passes on cu129

b72ebd5

Signed-off-by: Huy Do <[email protected]>

Merge branch 'main' into update-release-pipeline-2.8.0-release

a067974

mergify bot added the ci/build label Sep 2, 2025

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

youkaichao added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025

youkaichao added 4 commits September 2, 2025 12:34

fix on build 12.9 by default

f3bd441

Signed-off-by: youkaichao <[email protected]>

manual timeout

22864ff

Signed-off-by: youkaichao <[email protected]>

fix depends on

43a6f63

Signed-off-by: youkaichao <[email protected]>

remove timeout

db3dcc8

Signed-off-by: youkaichao <[email protected]>

nWEIdia approved these changes Sep 2, 2025

View reviewed changes

huydhn approved these changes Sep 2, 2025

View reviewed changes

khluu approved these changes Sep 2, 2025

View reviewed changes

youkaichao merged commit 42dc59d into main Sep 3, 2025
18 checks passed

youkaichao deleted the update-release-pipeline-2.8.0-release branch September 3, 2025 02:09

youkaichao mentioned this pull request Sep 3, 2025

Update release pipeline post PyTorch 2.8.0 update #23960

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update release pipeline post PyTorch 2.8.0 update #24073

Update release pipeline post PyTorch 2.8.0 update #24073

Uh oh!

youkaichao commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 2, 2025

Uh oh!

gemini-code-assist bot Sep 2, 2025

Uh oh!

nWEIdia commented Sep 2, 2025

Uh oh!

youkaichao commented Sep 2, 2025

Uh oh!

nWEIdia left a comment

Uh oh!

huydhn left a comment

Uh oh!

Uh oh!

youkaichao commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Update release pipeline post PyTorch 2.8.0 update #24073

Update release pipeline post PyTorch 2.8.0 update #24073

Uh oh!

Conversation

youkaichao commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nWEIdia commented Sep 2, 2025

Uh oh!

youkaichao commented Sep 2, 2025

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

huydhn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

youkaichao commented Sep 3, 2025

Uh oh!

Uh oh!

youkaichao commented Sep 2, 2025 •

edited by github-actions bot

Loading