-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[CI] Enable all hf transformers baselines in test_hybrid #23936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables Hugging Face Transformers baselines for all hybrid models in the test suite. This is made possible by a recent fix in transformers
v4.55.3 that resolves issues with Mamba-related models. The changes involve removing the HF_UNSUPPORTED_MODELS
list and updating the conditions in tests to always run the baseline comparison. Additionally, the minimum required transformers
version for BambaForCausalLM
and JambaForCausalLM
has been correctly updated to 4.55.3
. The changes are straightforward, correct, and improve test coverage.
Signed-off-by: Thomas Parnell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also remove the if hf_outputs is not None
checks?
Signed-off-by: Thomas Parnell <[email protected]>
@heheda12345 done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: 子悬 <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Matthew Bonanni <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Shiyan Deng <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: bruceszchen <[email protected]>
…t#23936) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: bruceszchen <[email protected]>
Purpose
HF transformers recently released v4.55.3 that contains a fix for the mamba-related issues that prevented us from comparing to transformers as a baseline in the hybrid model tests. I also checked that the two models we listed in
HF_UNSUPPORTED_MODELS
now seem to work fine.This is a useful step towards removing V0 code, since at that point we will no longer be able to use V0 output as a baseline for the V1 output, so we need to be able to rely on transformers for that.
cc @heheda12345
Test Plan
I will trigger Hybrid test in CI.
Test Result
Passing.
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.