Skip to content

Conversation

sammysun0711
Copy link

Details:
Revert GGUF Reader WA for OV GPU plugin: #2110

Ticket:
CVS-169891

@github-actions github-actions bot added the category: GGUF GGUF file reader label Jun 30, 2025
@sammysun0711 sammysun0711 changed the title [GGUF] Revert GGUF GPU WA [GGUF] Revert GGUF WA for GPU Jun 30, 2025
@sammysun0711 sammysun0711 requested a review from Copilot June 30, 2025 06:30
Copilot

This comment was marked as outdated.

@Wovchena
Copy link
Collaborator

The same test that was skipped for Mac fails for Linux. Maybe resolving it for Linux fixes it for Mac as well...

@sammysun0711
Copy link
Author

The same test that was skipped for Mac fails for Linux. Maybe resolving it for Linux fixes it for Mac as well...

Although same test test_full_gguf_pipeline failed on Mac and Linux. But the issue are different.

Will take a look for proper fix the test case.

@Wovchena Wovchena requested a review from Copilot July 18, 2025 06:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reverts previous workarounds (WA) for GPU plugin issues in the GGUF model handling code. The changes remove GPU-specific fixes that were implemented to address accuracy and compilation issues on MTL/LNL GPU platforms.

  • Removes shared embedding parameter and logic from language model creation
  • Reverts dynamic quantization group size from 0 (disabled) back to 64 (enabled)
  • Removes zero point array modification workaround for Q4_0 weights

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/cpp/src/gguf_utils/gguf_modeling.cpp Removes shared_embedding parameter, reverts dynamic quantization settings, and cleans up GPU-related comments
src/cpp/src/gguf_utils/building_blocks.hpp Updates make_lm_head function signature to remove shared_embedding parameter
src/cpp/src/gguf_utils/building_blocks.cpp Removes shared embedding logic and zero point array modification workaround

w_f32 = make_weights_subgraph(key, consts, lm_qtype, false, -1);
} else {
w_f32 = embeddings_node;
if (consts.count(key + ".weight")) {
Copy link
Preview

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logic structure has changed after removing the shared_embedding condition, but the original fallback logic (using embeddings_node when key + ".weight" doesn't exist) is preserved. Consider adding a comment to clarify this fallback behavior for future maintainers.

Copilot uses AI. Check for mistakes.

@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) no-match-files labels Jul 18, 2025
@sammysun0711
Copy link
Author

@Wovchena, updated GGUF CI tests and all test passed, local GPU test passed, can we merge this PR?

@Wovchena Wovchena added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 21, 2025
@sammysun0711 sammysun0711 added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 21, 2025
@Wovchena Wovchena added this pull request to the merge queue Jul 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 24, 2025
@Wovchena Wovchena added this pull request to the merge queue Jul 24, 2025
Merged via the queue into openvinotoolkit:master with commit ee9fbb9 Jul 24, 2025
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GGUF GGUF file reader category: LLM LLM pipeline (stateful, static) no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants