-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Fix weights loading for Apertus #24100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weights loading for Apertus #24100
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the need to load buffers for the XIELU activation function in the Apertus model. The implementation adds the necessary buffers to the params_dict
for weight loading. However, the method used to identify these buffers by their name suffix is slightly brittle and could lead to issues in the future. I've suggested a more robust approach that directly targets XIELU
modules, which will be more resilient to future changes in the model architecture.
for name, buffer in self.named_buffers(): | ||
if name.endswith(".beta") or name.endswith(".eps"): | ||
params_dict[name] = buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach of identifying buffers to load by checking if their names end with .beta
or .eps
is a bit brittle. It could unintentionally match buffers from other parts of the model if they happen to use the same suffix. This could lead to incorrect weights being loaded or errors if shapes don't match.
A more robust approach would be to explicitly iterate over XIELU
modules and add their buffers to params_dict
. This ensures that you are only targeting the intended buffers and makes the code more resilient to future changes.
for name, buffer in self.named_buffers(): | |
if name.endswith(".beta") or name.endswith(".eps"): | |
params_dict[name] = buffer | |
for module_name, module in self.named_modules(): | |
if isinstance(module, XIELU): | |
for buffer_name, buffer in module.named_buffers(recurse=False): | |
full_name = f'{module_name}.{buffer_name}' | |
params_dict[full_name] = buffer |
Signed-off-by: Nathan Ranchin <[email protected]>
…e buffer Signed-off-by: Nathan Ranchin <[email protected]>
797e773
to
7bab46c
Compare
Hi, |
Retrying the failing test |
Signed-off-by: Nathan Ranchin <[email protected]>
* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...
Signed-off-by: Nathan Ranchin <[email protected]> Signed-off-by: 子悬 <[email protected]>
Signed-off-by: Nathan Ranchin <[email protected]> Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Nathan Ranchin <[email protected]>
Signed-off-by: Nathan Ranchin <[email protected]> Signed-off-by: Shiyan Deng <[email protected]>
Signed-off-by: Nathan Ranchin <[email protected]>
Signed-off-by: Nathan Ranchin <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>
Purpose
This PR fixes the weight loading mechanism for the Apertus models. The model uses a custom XIELU activation function that needs custom parameters for each layers. These parameters need to be stored alongside the model's weights. So we need to load the buffers too.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.