Skip to content

AsyncGRPOTrainer: add ProcessorMixin handling#5895

Open
rycerzes wants to merge 3 commits into
huggingface:mainfrom
rycerzes:feat/async-grpo-processor-mixin
Open

AsyncGRPOTrainer: add ProcessorMixin handling#5895
rycerzes wants to merge 3 commits into
huggingface:mainfrom
rycerzes:feat/async-grpo-processor-mixin

Conversation

@rycerzes

@rycerzes rycerzes commented May 31, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds ProcessorMixin support to AsyncGRPOTrainer, matching the existing behavior in GRPOTrainer.

Currently, AsyncGRPOTrainer only uses AutoTokenizer when no processing_class is provided, and only accepts PreTrainedTokenizerBase instances. This fails for models like Qwen3.5 that use a processor (e.g., Qwen3_5ForConditionalGeneration returns a ProcessorMixin from AutoProcessor).

This PR:

  • Uses AutoProcessor.from_pretrained() instead of AutoTokenizer.from_pretrained() when no processing_class is provided
  • Accepts both PreTrainedTokenizerBase and ProcessorMixin as processing_class
  • Extracts the tokenizer from ProcessorMixin instances (processing_class.tokenizer)
  • Passes the extracted tokenizer (not the processor) to super().__init__() and AsyncRolloutWorker
  • Normalizes pad token on the tokenizer (pad_token ← eos_token when not set)

Changes:

  • async_grpo_trainer.py: update imports (AutoProcessor, ProcessorMixin), update processing_class type hint and docstring, add processor/tokenizer extraction logic

Tests:

  • test_processor_mixin_handling: passes an AutoProcessor instance as processing_class and verifies the trainer extracts a PreTrainedTokenizerBase
  • test_auto_processor_when_none: omits processing_class entirely and verifies the trainer auto-loads via AutoProcessor with a valid pad token

Closes part of #5831

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec


Note

Medium Risk
Touches core trainer initialization, model loading, and tokenization paths used by async GRPO training; changes are scoped and mirrored from GRPOTrainer with new tests.

Overview
AsyncGRPOTrainer now aligns with GRPOTrainer for multimodal and model-loading behavior. When no processing_class is passed, it loads AutoProcessor (left truncation/padding) instead of AutoTokenizer, accepts ProcessorMixin or PreTrainedTokenizerBase, normalizes pad_token on the inner tokenizer, and passes that tokenizer to the rollout worker and data collator while still handing the full processor to the base trainer.

Model weights are loaded via create_model_from_path with optional AsyncGRPOConfig.model_init_kwargs (e.g. dtype), with device_map=None forced for FSDP2.

Tests add model_init_kwargs coverage, extend the rollout stub for processor-only prompt data, and add a gated VLM training test that checks non-visual weights update while visual weights stay fixed on text-only data.

Reviewed by Cursor Bugbot for commit af4f768. Bugbot is set up for automated code reviews on this repo. Configure here.

Add model_init_kwargs to AsyncGRPOConfig and use create_model_from_path
in the trainer, matching GRPOTrainer's model loading behavior. This
allows passing custom kwargs (e.g., attn_implementation, dtype) when
loading the model from a string.

Closes part of huggingface#5831

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 90afff0. Configure here.

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated
@qgallouedec

Copy link
Copy Markdown
Member

at this point I don't know if it's very useful to integrate processor, a the asynchronous grpo trainer doesn't support vlm yet.

Comment thread tests/experimental/test_async_grpo_trainer.py Outdated
rycerzes added 2 commits June 2, 2026 12:06
Use AutoProcessor.from_pretrained when no processing_class is provided,
and extract the tokenizer from ProcessorMixin instances. This supports
VLM-style models (e.g., Qwen3.5) that use a processor without requiring
vision inputs, matching the behavior already present in GRPOTrainer.

Closes part of huggingface#5831
Align processor handling with GRPOTrainer by preserving the original
processing_class on the trainer and storing the extracted tokenizer
separately for tokenizer-only AsyncGRPO paths. Replace the shallow
processor plumbing checks with a text-only Qwen3.5/VLM regression test
that verifies training works and visual parameters remain unchanged
@rycerzes

rycerzes commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

at this point I don't know if it's very useful to integrate processor, a the asynchronous grpo trainer doesn't support vlm yet.

Yes I agree, this isn’t for full VLM support. The scope is text-only training for processor-backed models like Qwen3.5

Updated to stack on #5893, tests passing :)

@rycerzes rycerzes force-pushed the feat/async-grpo-processor-mixin branch from 90afff0 to af4f768 Compare June 2, 2026 06:54
@rycerzes rycerzes requested a review from qgallouedec June 2, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants