`AsyncGRPOTrainer`: add `ProcessorMixin` handling by rycerzes · Pull Request #5895 · huggingface/trl

rycerzes · 2026-05-31T12:09:43Z

What does this PR do?

Adds ProcessorMixin support to AsyncGRPOTrainer, matching the existing behavior in GRPOTrainer.

Currently, AsyncGRPOTrainer only uses AutoTokenizer when no processing_class is provided, and only accepts PreTrainedTokenizerBase instances. This fails for models like Qwen3.5 that use a processor (e.g., Qwen3_5ForConditionalGeneration returns a ProcessorMixin from AutoProcessor).

This PR:

Uses AutoProcessor.from_pretrained() instead of AutoTokenizer.from_pretrained() when no processing_class is provided
Accepts both PreTrainedTokenizerBase and ProcessorMixin as processing_class
Extracts the tokenizer from ProcessorMixin instances (processing_class.tokenizer)
Passes the extracted tokenizer (not the processor) to super().__init__() and AsyncRolloutWorker
Normalizes pad token on the tokenizer (pad_token ← eos_token when not set)

Changes:

async_grpo_trainer.py: update imports (AutoProcessor, ProcessorMixin), update processing_class type hint and docstring, add processor/tokenizer extraction logic

Tests:

test_processor_mixin_handling: passes an AutoProcessor instance as processing_class and verifies the trainer extracts a PreTrainedTokenizerBase
test_auto_processor_when_none: omits processing_class entirely and verifies the trainer auto-loads via AutoProcessor with a valid pad token

Closes part of #5831

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec

Note

Medium Risk
Touches core trainer initialization, model loading, and tokenization paths used by async GRPO training; changes are scoped and mirrored from GRPOTrainer with new tests.

Overview
AsyncGRPOTrainer now aligns with GRPOTrainer for multimodal and model-loading behavior. When no processing_class is passed, it loads AutoProcessor (left truncation/padding) instead of AutoTokenizer, accepts ProcessorMixin or PreTrainedTokenizerBase, normalizes pad_token on the inner tokenizer, and passes that tokenizer to the rollout worker and data collator while still handing the full processor to the base trainer.

Model weights are loaded via create_model_from_path with optional AsyncGRPOConfig.model_init_kwargs (e.g. dtype), with device_map=None forced for FSDP2.

Tests add model_init_kwargs coverage, extend the rollout stub for processor-only prompt data, and add a gated VLM training test that checks non-visual weights update while visual weights stay fixed on text-only data.

^{Reviewed by Cursor Bugbot for commit af4f768. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add model_init_kwargs to AsyncGRPOConfig and use create_model_from_path in the trainer, matching GRPOTrainer's model loading behavior. This allows passing custom kwargs (e.g., attn_implementation, dtype) when loading the model from a string. Closes part of huggingface#5831

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 90afff0. Configure here.}

qgallouedec · 2026-06-01T11:46:17Z

at this point I don't know if it's very useful to integrate processor, a the asynchronous grpo trainer doesn't support vlm yet.

Use AutoProcessor.from_pretrained when no processing_class is provided, and extract the tokenizer from ProcessorMixin instances. This supports VLM-style models (e.g., Qwen3.5) that use a processor without requiring vision inputs, matching the behavior already present in GRPOTrainer. Closes part of huggingface#5831

Align processor handling with GRPOTrainer by preserving the original processing_class on the trainer and storing the extracted tokenizer separately for tokenizer-only AsyncGRPO paths. Replace the shallow processor plumbing checks with a text-only Qwen3.5/VLM regression test that verifies training works and visual parameters remain unchanged

rycerzes · 2026-06-02T06:53:38Z

at this point I don't know if it's very useful to integrate processor, a the asynchronous grpo trainer doesn't support vlm yet.

Yes I agree, this isn’t for full VLM support. The scope is text-only training for processor-backed models like Qwen3.5

Updated to stack on #5893, tests passing :)

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

qgallouedec reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/experimental/test_async_grpo_trainer.py Outdated

rycerzes added 2 commits June 2, 2026 12:06

rycerzes force-pushed the feat/async-grpo-processor-mixin branch from 90afff0 to af4f768 Compare June 2, 2026 06:54

rycerzes requested a review from qgallouedec June 2, 2026 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AsyncGRPOTrainer`: add `ProcessorMixin` handling#5895

`AsyncGRPOTrainer`: add `ProcessorMixin` handling#5895
rycerzes wants to merge 3 commits into
huggingface:mainfrom
rycerzes:feat/async-grpo-processor-mixin

rycerzes commented May 31, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Jun 1, 2026

Uh oh!

Uh oh!

rycerzes commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rycerzes commented May 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

This PR:

Before submitting

AI writing disclosure

Who can review?

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Jun 1, 2026

Uh oh!

Uh oh!

rycerzes commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rycerzes commented May 31, 2026 •

edited by cursor Bot

Loading

rycerzes commented Jun 2, 2026 •

edited

Loading