DPOTrainer uses tokenizer instead of processor for Gemma3 vision

### Reproduction

I am trying to fine tune Gemma3 4B with DPOTrainer. My dataset is proper formatted with "prompt", "chosen", "rejected", and "images" columns. Since Gemma3 4b is a vision model and I passed in a processor instead of tokenizer, the trainer should call `process_row` instead of `tokenize_row` to avoid the issue below. 

However, I saw from the source code in `dpo_trainer.py` that this tokenize vs process is determined based on this: 

```python
self.is_vision_model = model.config.model_type in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES.keys()
``` 

however `MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES` does not contain `gemma3` (it has `paligemma`), so `DPOTrainer` is using the processor I provided as a tokenizer instead of calling the proper `process_row` function to prepare dataset internally.

I'm not sure if this is intentional or a bug. Perhaps we can check with `MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES` instead since it contains more recent VLM models like `gemma3`?

Apologies if I'm using the DPOTrainer wrongly with Gemma3 since I cannot find any notebook example showing how to use it with a vision dataset with gemma.

```python
from unsloth import FastVisionModel
from transformers import AutoProcessor
from trl import DPOConfig, DPOTrainer

model, _ = FastVisionModel.from_pretrained(
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)
processor = AutoProcessor.from_pretrained(
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit", trust_remote_code=True
)
...
dpo_args = DPOConfig(...)
trainer = DPOTrainer(
    model=model,
    args=dpo_args,
    train_dataset=formatted_dataset,
    processing_class=processor,
)
```

outputs:

```
AttributeError                            Traceback (most recent call last)
[/tmp/ipython-input-2298813859.py](https://localhost:8080/#) in <cell line: 0>()
      1 # Unsloth's version of DPOTrainer accepts a `processing_class` argument
      2 # which we pass the main processor to.
----> 3 trainer = DPOTrainer(
      4     model=model,
      5     args=dpo_args,

13 frames
[/content/unsloth_compiled_cache/UnslothDPOTrainer.py](https://localhost:8080/#) in tokenize_row()
   1028             if tokenizer.eos_token_id is not None:
   1029                 prompt_input_ids = prompt_input_ids + [tokenizer.eos_token_id]
-> 1030         chosen_input_ids = chosen_input_ids + [tokenizer.eos_token_id]
   1031         rejected_input_ids = rejected_input_ids + [tokenizer.eos_token_id]
   1032 

AttributeError: 'Gemma3Processor' object has no attribute 'eos_token_id'
```

### System Info

- Platform: Linux-6.1.123+-x86_64-with-glibc2.35
- Python version: 3.12.11
- TRL version: 0.22.1
- PyTorch version: 2.8.0+cu126
- accelerator(s): NVIDIA L4
- Transformers version: 4.55.4
- Accelerate version: 1.10.1
- Accelerate config: not found
- Datasets version: 3.6.0
- HF Hub version: 0.34.4
- bitsandbytes version: 0.45.3
- DeepSpeed version: not installed
- Diffusers version: 0.35.1
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.101.0
- PEFT version: 0.17.1
- vLLM version: not installed

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DPOTrainer uses tokenizer instead of processor for Gemma3 vision #3982

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DPOTrainer uses tokenizer instead of processor for Gemma3 vision #3982

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions