[GOLD] VLM support for GOLDTrainer by Strongich · Pull Request #5969 · huggingface/trl

Strongich · 2026-06-07T14:24:41Z

Clear PR, based on #5461 with changes introduced in 2ac060e

Adds VLM support to GOLDTrainer:

Supports same-family VLM distillation with JSD loss when student and teacher share compatible tokenization/image-token semantics.
Supports cross-family VLM distillation with ULD loss by processing images separately through the student and teacher processors.
Preserves raw PIL images through the dataloader with an identity collator, then materializes VLM batches only for the current accumulation slice.
Adds VLM on-policy generation paths for both local generation and vLLM-backed generation where the vLLM backend supports the selected model.
Adds teacher-side VLM input construction for ULD, including a fix for a silent alignment bug: teacher completions are now rendered with the teacher chat template instead of approximating them from generated text plus a manually appended EOS.
Adds examples/scripts/gold_vlm.py with documented same-family JSD and cross-family ULD examples.
Adds tests covering VLM collation, label masking, raw image preservation, cross-architecture validation, teacher processor setup, vLLM behavior, lazy slice materialization, and ULD alignment regressions.

Motivation

The GOLD algorithm has no theoretical constraints against VLM-to-VLM distillation -- the barriers were purely engineering (incompatible image token formats, different tokenizers, raw image handling through the dataloader).

Key changes

GOLDTrainer detects VLM datasets and uses an identity collator to preserve raw PIL images through the dataloader
For cross-architecture pairs, a _teacher_processor is stored and used in compute_loss to build teacher-compatible vision tensors from raw images
Auto-resolves teacher_tokenizer_name_or_path
Added examples/scripts/gold_vlm.py with two documented usage examples (same-family JSD + vLLM, cross-family ULD)
Added tests for VLM collator (label masking, completion preservation), cross-architecture detection (rejects JSD, stores teacher processor for different archs, skips it for same arch), VLM + vLLM init (copied from the LLM example), rejects LLM teacher with vision dataset
VLM handling (identity collator, raw image storage, vLLM multimodal path) is borrowed (where it was possible) from SFTTrainer and GRPOTrainer

Note

I didn't add VLM usage examples to docs/source/gold_trainer.md -- will add if that's desirable, just let me know.
liger_kernel is not yet supported with VLMs. I plan to work with it in 2 stages:
- add VLM support ot GKDTrainer
- add liger kernel VLM support to both GOLDTrainer and GKDTrainer (in a single PR)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@qgallouedec @kashif

Note

Medium Risk
Touches core distillation training, multimodal batching, and teacher/student forward paths; mitigated by extensive tests but behavior is complex and memory-sensitive.

Overview
GOLDTrainer now supports vision-language model (VLM) distillation, not just text LLMs.

For vision datasets it keeps raw PIL images in the dataloader via an identity collator, then collates per gradient-accumulation slice with a new DataCollatorForVisionLanguageChatML (prompt/completion split, pixel_values, untemplated text for ULD, byte offsets). Same-family VLM pairs can use JSD with shared multimodal forwards; cross-architecture pairs require use_uld_loss and a separate _teacher_processor that builds teacher inputs (including images) via _build_teacher_vlm_inputs. On-policy training adds VLM paths for vLLM (multimodal prompts) and local generate, with lazy slice materialization and eval fixes in prediction_step. Init validates VLM↔VLM pairing, rejects vision data on text-only students, and blocks Liger on VLMs.

Adds examples/scripts/gold_vlm.py (GEOQA, JSD vs ULD examples) and a large test_gold_trainer.py VLM regression suite (collation, ULD alignment, vLLM duplication, smoke train steps).

^{Reviewed by Cursor Bugbot for commit 251cddc. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5fd183a. Configure here.}

…tion

Strongich · 2026-06-07T14:40:46Z

New training runs from examples/scripts/gold_vlm.py:

Qwen 8B -> Qwen 2B, JSD loss, vLLM:

2. Qwen 8B -> Qwen 2B, ULD loss, vLLM:

3. Qwen 8B -> LFM 1.6B, ULD loss, no-vllm:

Results are consistent with #5461 (comment) and kashif#6 (comment)

kashif · 2026-06-07T16:28:21Z

nice! the scripts for experimental belong in the experimental trainer's folder for now

Strongich · 2026-06-08T09:59:10Z

To avoid opening a separate discussion or issue, I think I can address this idea here (but if you'd prefer otherwise, I'll create one):

liger_kernel is not yet supported with VLMs. I plan to work on it in 2 stages:

add VLM support to GKDTrainer

add liger kernel VLM support to both GOLDTrainer and GKDTrainer (in a single PR)

Since GKD functionality is fundamentally a case of GOLD, when use_uld_loss=False and we use same-family models, adding duplicative VLM support to the GKDTrainer is excessive imho.

I think we could later combine everything under a single GOLDTrainer (basically dropping a separate GKDTrainer) to keep the whole distillation logic in one place and avoid having to change the JSD path in two separate places. This way, I wouldn't need to add separate VLM support to GKDTrainer and then update liger_kernel -- I'd only need to implement the second part.

cursor Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread trl/experimental/gold/gold_trainer.py Outdated

Comment thread trl/experimental/gold/gold_trainer.py

[GOLD] VLM support for GOLDTrainer

a3a6a3f

Strongich force-pushed the gold-vlm-support branch from 5fd183a to a3a6a3f Compare June 7, 2026 14:27

[GOLD] handle string student model in processor and cross-arch resolu…

8b3f709

…tion

Strongich mentioned this pull request Jun 7, 2026

[GOLD] GOLDTrainer VLM support #5461

Closed

8 tasks

[gold] add 2 e2e smoke tests

c336947

kashif self-assigned this Jun 7, 2026

Merge branch 'main' into gold-vlm-support

251cddc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GOLD] VLM support for GOLDTrainer#5969

[GOLD] VLM support for GOLDTrainer#5969
Strongich wants to merge 4 commits into
huggingface:mainfrom
Strongich:gold-vlm-support

Strongich commented Jun 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Strongich commented Jun 7, 2026

Uh oh!

kashif commented Jun 7, 2026

Uh oh!

Strongich commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Strongich commented Jun 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adds VLM support to GOLDTrainer:

Motivation

Key changes

Note

Before submitting

AI writing disclosure

Who can review?

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Strongich commented Jun 7, 2026

Uh oh!

kashif commented Jun 7, 2026

Uh oh!

Strongich commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Strongich commented Jun 7, 2026 •

edited by cursor Bot

Loading

Strongich commented Jun 8, 2026 •

edited

Loading