-
Notifications
You must be signed in to change notification settings - Fork 286
[GGUF] Revert GGUF WA for GPU #2392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GGUF] Revert GGUF WA for GPU #2392
Conversation
…, which was fixed by PR30698 & PR30941
The same test that was skipped for Mac fails for Linux. Maybe resolving it for Linux fixes it for Mac as well... |
Although same test
Will take a look for proper fix the test case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reverts previous workarounds (WA) for GPU plugin issues in the GGUF model handling code. The changes remove GPU-specific fixes that were implemented to address accuracy and compilation issues on MTL/LNL GPU platforms.
- Removes shared embedding parameter and logic from language model creation
- Reverts dynamic quantization group size from 0 (disabled) back to 64 (enabled)
- Removes zero point array modification workaround for Q4_0 weights
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/cpp/src/gguf_utils/gguf_modeling.cpp | Removes shared_embedding parameter, reverts dynamic quantization settings, and cleans up GPU-related comments |
src/cpp/src/gguf_utils/building_blocks.hpp | Updates make_lm_head function signature to remove shared_embedding parameter |
src/cpp/src/gguf_utils/building_blocks.cpp | Removes shared embedding logic and zero point array modification workaround |
w_f32 = make_weights_subgraph(key, consts, lm_qtype, false, -1); | ||
} else { | ||
w_f32 = embeddings_node; | ||
if (consts.count(key + ".weight")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The logic structure has changed after removing the shared_embedding condition, but the original fallback logic (using embeddings_node when key + ".weight" doesn't exist) is preserved. Consider adding a comment to clarify this fallback behavior for future maintainers.
Copilot uses AI. Check for mistakes.
@Wovchena, updated GGUF CI tests and all test passed, local GPU test passed, can we merge this PR? |
Details:
Revert GGUF Reader WA for OV GPU plugin: #2110
Ticket:
CVS-169891