Fix #3982: Fix DPO Trainer support for Gemma 3 vision models #4022
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This PR addresses an issue with the DPO Trainer's handling of vision-language models, specifically for Gemma 3. The changes enhance model type detection to properly support image-text-to-text models.
Fixes #3982
Changes made:
MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES
from transformers' modeling_autois_vision_model
check to include both vision-to-sequence and image-text-to-text model typespixel_values
andpixel_attention_mask
to the signature columns to properly process vision inputsTesting
test_dpo_trainer_gemma3_vision_model_detection
in verifies that Gemma3 models are correctly identified as vision models and processed through the appropriate pipelineDPOTrainer
incorrectly routed Gemma3 models through tokenizer path instead of processor pathDPOTrainer
with Gemma3 vision models and verifiedis_vision_model=True
The test suite ensures that Gemma3 and other image-text-to-text models are properly detected and routed through the vision processing pipeline, preventing the processor/tokenizer confusion that caused training failures.
Review Request 🙏
I've included unit tests and verified the fix resolves the original issue. Would appreciate maintainer review when possible.
Thanks!