-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Update release pipeline post PyTorch 2.8.0 update #24073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the release pipeline to use CUDA 12.9, which is necessary for the PyTorch 2.8.0 update. The changes correctly update the CUDA versions and corresponding build identifiers in the Buildkite pipeline configuration. The wheel upload script has also been modified to handle CUDA 12.9 as the new default version for publishing wheel indices. My review includes suggestions to improve the robustness of the wheel upload script to make it more maintainable and prevent potential issues in future releases.
if [[ $normal_wheel == *"cu126"* ]]; then | ||
# if $normal_wheel matches cu126, do not upload the index.html | ||
echo "Skipping index files for cu126 wheels" | ||
elif [[ $normal_wheel == *"cu128"* ]]; then | ||
# if $normal_wheel matches cu128, do not upload the index.html | ||
echo "Skipping index files for cu128 wheels" | ||
else | ||
# only upload index.html for cu128 wheels (default wheels) | ||
# only upload index.html for cu129 wheels (default wheels) as it | ||
# is available on both x86 and arm64 | ||
aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html" | ||
aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current if/elif/else
structure for determining whether to upload index files is fragile. It assumes that any wheel not matching cu126
or cu128
is the default cu129
. This could lead to incorrect behavior if other wheel versions are added to the build pipeline in the future.
Using a case
statement would make this logic more explicit and robust. It clearly defines the action for each known wheel type and can provide a warning for unexpected versions, preventing silent failures in the release process.
if [[ $normal_wheel == *"cu126"* ]]; then | |
# if $normal_wheel matches cu126, do not upload the index.html | |
echo "Skipping index files for cu126 wheels" | |
elif [[ $normal_wheel == *"cu128"* ]]; then | |
# if $normal_wheel matches cu128, do not upload the index.html | |
echo "Skipping index files for cu128 wheels" | |
else | |
# only upload index.html for cu128 wheels (default wheels) | |
# only upload index.html for cu129 wheels (default wheels) as it | |
# is available on both x86 and arm64 | |
aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html" | |
aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html" | |
fi | |
case "$normal_wheel" in | |
*cu129*) | |
# only upload index.html for cu129 wheels (default wheels) as it | |
# is available on both x86 and arm64 | |
aws s3 cp index.html "s3://vllm-wheels/$BUILDKITE_COMMIT/vllm/index.html" | |
aws s3 cp "s3://vllm-wheels/nightly/index.html" "s3://vllm-wheels/$BUILDKITE_COMMIT/index.html" | |
;; | |
*cu126*|*cu128*) | |
# if $normal_wheel matches cu126 or cu128, do not upload the index.html | |
echo "Skipping index files for $normal_wheel wheels" | |
;; | |
*) | |
echo "WARNING: Unhandled wheel version for index upload: $normal_wheel. Not uploading index." | |
;; | |
esac |
if [[ $normal_wheel == *"cu126"* ]]; then | ||
# if $normal_wheel matches cu126, do not upload the index.html | ||
echo "Skipping index files for cu126 wheels" | ||
elif [[ $normal_wheel == *"cu128"* ]]; then | ||
# if $normal_wheel matches cu128, do not upload the index.html | ||
echo "Skipping index files for cu128 wheels" | ||
else | ||
# only upload index.html for cu128 wheels (default wheels) | ||
# only upload index.html for cu129 wheels (default wheels) as it | ||
# is available on both x86 and arm64 | ||
aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the block above, this if/elif/else
structure is fragile. It implicitly assumes any wheel not matching cu126
or cu128
is the default cu129
for which the nightly index should be updated.
Refactoring to a case
statement will make the logic more robust and maintainable, explicitly handling known cases and warning about unknown ones. This is important for the stability of the release pipeline.
if [[ $normal_wheel == *"cu126"* ]]; then | |
# if $normal_wheel matches cu126, do not upload the index.html | |
echo "Skipping index files for cu126 wheels" | |
elif [[ $normal_wheel == *"cu128"* ]]; then | |
# if $normal_wheel matches cu128, do not upload the index.html | |
echo "Skipping index files for cu128 wheels" | |
else | |
# only upload index.html for cu128 wheels (default wheels) | |
# only upload index.html for cu129 wheels (default wheels) as it | |
# is available on both x86 and arm64 | |
aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html" | |
fi | |
case "$normal_wheel" in | |
*cu129*) | |
# only upload index.html for cu129 wheels (default wheels) as it | |
# is available on both x86 and arm64 | |
aws s3 cp index.html "s3://vllm-wheels/nightly/vllm/index.html" | |
;; | |
*cu126*|*cu128*) | |
# if $normal_wheel matches cu126 or cu128, do not upload the index.html | |
echo "Skipping index files for $normal_wheel wheels" | |
;; | |
*) | |
echo "WARNING: Unhandled wheel version for nightly index upload: $normal_wheel. Not uploading index." | |
;; | |
esac |
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
I can see this PR pushed a new aarch64 image from the 2nd last commit of this branch, great! (See: https://gallery.ecr.aws/q9t5s3a7/vllm-release-repo |
because I don't trigger release build for the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamped! Retrying on #23960 seems to work for me too to avoid timing out in the release build https://buildkite.com/vllm/release/builds/7828. So both PR(s) are fine I think
yeah that's because I manually triggered a 10 hour build, now the compilation cache is populated, and later release build can be much faster. |
* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...
Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]> Signed-off-by: Shiyan Deng <[email protected]>
Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Huy Do <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>
Purpose
Redo #23960 in an upstream branch so that we can trigger build with custom env vars.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.