[Sync] Upstream 20250910 commit `e408272` #667

tjtanaavllm · 2025-09-11T03:12:06Z

Purpose

Sync with upstream to get latest changes and to fix the Compressed Tensor FP8 weight loading accuracy issue.

Upgrade vLLM version to 0.10.2rc2.dev+ge408272

Test Plan

Validated on model of interests

Qwen/Qwen3-235B-A22B-FP8

MODEL=Qwen/Qwen3-235B-A22B-FP8


VLLM_USE_V1=1
VLLM_ROCM_USE_AITER=1 \
vllm serve $MODEL \
--tensor-parallel-size 8 \
--max-num-batched-tokens 32768 \
--disable-log-requests \
--kv-cache-dtype fp8 \
--compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
--trust-remote-code \
--enable_expert_parallel \
--port 6789 \
> server-Qwen_Qwen3-235B-A22B-FP8-aiter-v1-fp8-cudagraph_FULL.log 2>&1

Qwen/Qwen2.5-VL-72B-Instruct

VLLM_RPC_TIMEOUT=1800000 \
VLLM_USE_V1=1 \
VLLM_ROCM_USE_AITER=1 \
VLLM_ROCM_USE_AITER_MHA=1 \
VLLM_USE_TRITON_FLASH_ATTN=0 \
SAFETENSORS_FAST_GPU=1 \
vllm serve Qwen/Qwen2.5-VL-72B-Instruct \
 -tp 8 \
 --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
 --trust_remote_code \
 --mm-processor-kwargs '{"max_pixels":802816,"min_pixels":3136}' \
 --limit-mm-per-prompt='{"image": 64}' \
 --mm-encoder-tp-mode "data" \
> server_Qwen_Qwen2.5-VL-72B-Instruct-aiter-v1-tp8-dp8-cudagraph_FULL_AND_PIECEWISE.log 2>&1

Qwen/Qwen2.5-VL-3B-Instruct

VLLM_RPC_TIMEOUT=1800000 \
VLLM_USE_V1=1 \
VLLM_ROCM_USE_AITER=1 \
VLLM_ROCM_USE_AITER_MHA=1 \
VLLM_USE_TRITON_FLASH_ATTN=0 \
SAFETENSORS_FAST_GPU=1 \
vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
 -tp 1 \
 --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
 --trust_remote_code \
 --mm-processor-kwargs '{"max_pixels":802816,"min_pixels":3136}' \
 --limit-mm-per-prompt='{"image": 64}' \
> server_Qwen_Qwen2.5-VL-3B-Instruct-aiter-v1-tp1-cudagraph_FULL_AND_PIECEWISE.log 2>&1

Test Result

Qwen/Qwen3-235B-A22B-FP8

local-completions (model=Qwen/Qwen3-235B-A22B-FP8,base_url=http://127.0.0.1:6789/v1/completions), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 100
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8423|±  |0.0100|
|     |       |strict-match    |     5|exact_match|↑  |0.8150|±  |0.0107|

Qwen/Qwen2.5-VL-72B-Instruct

For detailed information on this command, run:
  run.py eval_vllm --model_name Qwen/Qwen2.5-VL-72B-Instruct --url http://0.0.0.0:8000 --output_dir ./chartqa --eval_name chartqa - --help
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.8684,
    "anywhere_in_answer_relaxed_correctness": 0.8852
}
================================================================================

Qwen/Qwen2.5-VL-3B-Instruct

For detailed information on this command, run:
  run.py eval_vllm --model_name Qwen/Qwen2.5-VL-3B-Instruct --url http://0.0.0.0:8000 --output_dir ./chartqa --eval_name chartqa - --help
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.8104,
    "anywhere_in_answer_relaxed_correctness": 0.8144
}
================================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ject#23831) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

…-project#23843)

Signed-off-by: zjy0516 <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…23971) Signed-off-by: sadeghja1070 <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Andy Xie <[email protected]>

…finally (vllm-project#23758) Signed-off-by: Andy Xie <[email protected]>

Signed-off-by: Andy Xie <[email protected]>

Signed-off-by: Andy Lo <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Max de Bayser <[email protected]>

…entifiers. (vllm-project#23394) Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Didier Durand <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Or Ozeri <[email protected]>

…TQ and AutoRound-GPTQ) (vllm-project#23994) Signed-off-by: JartX <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: JunHowie <[email protected]> Co-authored-by: JunHowie <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…tialization (vllm-project#23357) Signed-off-by: Isotr0py <[email protected]>

…oject#23677) Signed-off-by: Andy Xie <[email protected]>

Signed-off-by: Benji Beck <[email protected]>

Signed-off-by: Christian Pinto <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]>

…LAM Tool Parser (vllm-project#22769) Signed-off-by: Devon Peroutky <[email protected]>

Signed-off-by: Andy Xie <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

…L_LEN (vllm-project#20904) Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Wentao Ye <[email protected]>

Signed-off-by: Didier Durand <[email protected]>

…lm-project#23735) Signed-off-by: NickLucche <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…4524) Signed-off-by: Tyler Michael Smith <[email protected]>

…m-project#24538)

… attn chunking (vllm-project#24474) Signed-off-by: Yong Hoon Shin <[email protected]>

…roject#24546) Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

… later) (vllm-project#24129) Signed-off-by: ignaciosica <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

…t#24528) Signed-off-by: Nick Hill <[email protected]>

…llm-project#23620) Signed-off-by: ApostaC <[email protected]>

Signed-off-by: chaunceyjiang <[email protected]>

…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]>

…llm-project#24154) Signed-off-by: Wei Wei <[email protected]>

…verrides processing. (vllm-project#24271) Signed-off-by: Chenheli Hua <[email protected]>

…ct#23845) Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Signed-off-by: Peter Schuurman <[email protected]> Co-authored-by: Omer Dayan (SW-GPU) <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

… the docs (vllm-project#24041) Signed-off-by: Harry Mellor <[email protected]>

Signed-off-by: tjtanaavllm <[email protected]>

github-actions · 2025-09-11T03:12:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

tdoublep and others added 30 commits August 30, 2025 00:16

[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba (vllm-pro…

4071c76

…ject#23831) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Bugfix] Fix test_lora_resolvers.py (vllm-project#23984)

628d00c

Signed-off-by: Jee Jee Li <[email protected]>

[UT] fix unify_kv_cache_configs when kv cache config needs sort (vllm…

5490d63

…-project#23843)

[Model] Enable encoder DP for MiniCPM-V (vllm-project#23948)

3a6acad

Signed-off-by: zjy0516 <[email protected]> Signed-off-by: Jiangyun Zhu <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Misc] add reorder_batch AttentionMetadataBuilder (vllm-project#23798)

fb4983e

Signed-off-by: Andy Xie <[email protected]>

[Refactor] refactor freezing_value/cuda_event initialize outside try …

e80bca3

…finally (vllm-project#23758) Signed-off-by: Andy Xie <[email protected]>

[Misc] enhance type hint for rearrange return value (vllm-project#23519)

68a3491

Signed-off-by: Andy Xie <[email protected]>

[LoRA] Much faster startup when LoRA is enabled (vllm-project#23777)

038e9be

Signed-off-by: Andy Lo <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

[Core][Multimodal] Allow passing multi_modal_uuids as multimodal id…

749be00

…entifiers. (vllm-project#23394) Signed-off-by: Roger Wang <[email protected]>

[Doc]: fix typos in Python comments (vllm-project#24001)

9701352

Signed-off-by: Didier Durand <[email protected]>

vllm fix check on max vocab size (vllm-project#22471)

81eea3d

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[Minor] Fix some random typos in comments (vllm-project#24009)

752d2e1

Signed-off-by: Nick Hill <[email protected]>

v1: Support KV events from connectors (vllm-project#19737)

14b4326

Signed-off-by: Or Ozeri <[email protected]>

[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGP…

183a709

…TQ and AutoRound-GPTQ) (vllm-project#23994) Signed-off-by: JartX <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

[Misc] Avoid redundant copy for encoder-only models (vllm-project#24012)

8c742a6

Signed-off-by: Woosuk Kwon <[email protected]>

Fix the bug related to loading GPTP INT3 weights. (vllm-project#23328)

acc1a6e

Signed-off-by: JunHowie <[email protected]> Co-authored-by: JunHowie <[email protected]> Co-authored-by: Isotr0py <[email protected]>

[Misc] Move fast prefill logic to separate method (vllm-project#24013)

b557136

Signed-off-by: Woosuk Kwon <[email protected]>

[CI/Build] Improve Tensor Schema tests speed by avoid engine core ini…

ff0e59d

…tialization (vllm-project#23357) Signed-off-by: Isotr0py <[email protected]>

[Misc] refactor code by import as for torch._inductor.config (vllm-pr…

499b074

…oject#23677) Signed-off-by: Andy Xie <[email protected]>

Migrate Phi4 inputs to TensorSchema (vllm-project#23471)

437c3ce

Signed-off-by: Benji Beck <[email protected]>

[Misc] IO Processor plugins for pooling models (vllm-project#22820)

1cb39db

Signed-off-by: Christian Pinto <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]>

[Bugfix] Add support for <tool_call> format in streaming mode for X…

422e793

…LAM Tool Parser (vllm-project#22769) Signed-off-by: Devon Peroutky <[email protected]>

[Misc] add hash_function doc string (vllm-project#24014)

5438967

Signed-off-by: Andy Xie <[email protected]>

[Misc] Enable V1 FP16 inference on pre-Ampere GPUs (vllm-project#24022)

d7fbc6d

Signed-off-by: Isotr0py <[email protected]>

[Frontend] Update the warning log when using VLLM_ALLOW_LONG_MAX_MODE…

55602bb

…L_LEN (vllm-project#20904) Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]>

[Kernel] Update DeepGEMM to latest commit (vllm-project#23915)

dc1a531

Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Wentao Ye <[email protected]>

[Doc]: fix typos in Python comments (vllm-project#24026)

1072849

Signed-off-by: Didier Durand <[email protected]>

[Frontend] Gemma3n audio transcriptions/translations endpoint (vl…

d46934b

…lm-project#23735) Signed-off-by: NickLucche <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

tlrmchlsmth and others added 17 commits September 10, 2025 00:32

[Bugfix] Improve EPLB config validation error message (vllm-project#2…

561f38d

…4524) Signed-off-by: Tyler Michael Smith <[email protected]>

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (vll…

b23fb78

…m-project#24538)

[Perf] Convert np array to torch tensor to index into block table for…

dc625ea

… attn chunking (vllm-project#24474) Signed-off-by: Yong Hoon Shin <[email protected]>

Add @heheda12345 to CODEOWNERS of KVCacheManager related code (vllm-p…

41f160b

…roject#24546) Signed-off-by: Chen Zhang <[email protected]>

[CI] Retry flaky fp8 cutlass mla tests (vllm-project#24536)

7e7db04

Signed-off-by: Nick Hill <[email protected]>

[Hardware][Apple-CPU] Enable native bfloat16 on Apple Silicon (M2 and…

3c2156b

… later) (vllm-project#24129) Signed-off-by: ignaciosica <[email protected]>

[BugFix] Fix async core engine client finalizer (vllm-project#24540)

f88e840

Signed-off-by: Nick Hill <[email protected]>

[CI] Adjust threshold for flaky ngram spec decoding test (vllm-projec…

83dd28a

…t#24528) Signed-off-by: Nick Hill <[email protected]>

[KV Connector] More async support for get_num_new_matched_tokens (v…

b4a01aa

…llm-project#23620) Signed-off-by: ApostaC <[email protected]>

[P/D] MultiConnector supports shutdown (vllm-project#24425)

309d7aa

Signed-off-by: chaunceyjiang <[email protected]>

[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-…

53b42f4

…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <[email protected]>

[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (v…

0efdb5c

…llm-project#24154) Signed-off-by: Wei Wei <[email protected]>

[Core] Simplify and unify mm uuid handling & auto-generated mm hash o…

009d689

…verrides processing. (vllm-project#24271) Signed-off-by: Chenheli Hua <[email protected]>

[Docs] Enable relative links in examples to function when rendered in…

e408272

… the docs (vllm-project#24041) Signed-off-by: Harry Mellor <[email protected]>

sync upstream to commit e408272

7d42a43

Signed-off-by: tjtanaavllm <[email protected]>

upgrade version to 0.10.2rc2.dev+ge408272

5910bc5

Signed-off-by: tjtanaavllm <[email protected]>

tjtanaavllm requested review from wuhuikx and zejunchen-zejun September 11, 2025 03:12

tjtanaavllm requested review from Alexei-V-Ivanov-AMD, shajrawi, gshtras, maleksan85, sunway513, hongxiayang, divakar-amd, charlifu and mawong-amd as code owners September 11, 2025 03:12

tjtanaavllm merged commit 84bf287 into llama_fp8_03122025 Sep 12, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Sync] Upstream 20250910 commit `e408272` #667

[Sync] Upstream 20250910 commit `e408272` #667

Uh oh!

tjtanaavllm commented Sep 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

[Sync] Upstream 20250910 commit e408272 #667

[Sync] Upstream 20250910 commit e408272 #667

Uh oh!

Conversation

tjtanaavllm commented Sep 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

[Sync] Upstream 20250910 commit `e408272` #667

[Sync] Upstream 20250910 commit `e408272` #667

tjtanaavllm commented Sep 11, 2025 •

edited by github-actions bot

Loading