Skip to content

Multi turn rollout#193

Open
tastelikefeet wants to merge 56 commits into
modelscope:mainfrom
tastelikefeet:feat/multi-turn-rollout
Open

Multi turn rollout#193
tastelikefeet wants to merge 56 commits into
modelscope:mainfrom
tastelikefeet:feat/multi-turn-rollout

Conversation

@tastelikefeet
Copy link
Copy Markdown
Collaborator

@tastelikefeet tastelikefeet commented May 18, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Features

  1. Add twinkle_agent package to hold multi-turn rollouts and tool callings
  2. Add notifier to notify user when training is failed
  3. Add GRPO entropy loss/ref logps loss
  4. Add GRPO series metrics
  5. Add a patch to qwen3 series models to fix a bug that special tokens will cause jinja encode error
  6. Support sample with base model when enable_lora is True
  7. Support lora model id used in sampling
  8. Support Qwen models tool parsing and cleaning
  9. Support selective_log_softmax returns entropy
  10. Support api/vllm multi turn rollouts and trace saving
  11. Support tool manager
  12. [Experimental] Support passage chunker and condenser and tool

Bug Fix

  1. Fix dataset multi-process preprocessing
  2. Fix a bug that mm model templates cause cache failed

Experiment results

Paste your experiment result here(if needed).

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive framework for multi-turn agentic rollouts with trajectory compression, including a new MultiTurnCondenseRollout class, chunking utilities, and a keyword-based condenser. It also adds support for per-token oracle bonuses, entropy bonuses, and improved logging diagnostics for GRPO training. Additionally, it patches the Qwen3 chat template to resolve parsing issues with orphan </think> tags and includes a dataset builder for condensed SFT training.

@tastelikefeet tastelikefeet changed the title [WIP]Multi turn rollout Multi turn rollout May 18, 2026
Comment thread src/twinkle/loss/grpo.py Outdated
Comment thread src/twinkle/utils/torch_utils.py
Comment thread src/twinkle_agentic/rollout/api_multi_turn.py Outdated
@tastelikefeet
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aligns the message and tool-call data formats with the OpenAI chat-completion schema and introduces the twinkle_agentic package to support multi-turn agentic rollouts with tool-use, chunking, and passage condensation. Infrastructure enhancements include a new exception notification system, support for entropy bonuses in GRPO loss, and robustness patches for Qwen3 chat templates. The review identifies a critical bug where zero temperature results in zeroed log-probability metrics and notes potential side effects from module-level multiprocessing configuration. Further feedback suggests refining token limit heuristics and improving template monotonicity assertions with better logging.

Comment thread src/twinkle/dataset/base.py
Comment on lines +149 to +152
scale = self.temperature
logps_f = logps.float()
if scale != 1.0:
logps_f = logps_f * scale
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If self.temperature is set to 0.0 (which is common for greedy sampling), scale becomes 0.0. Multiplying logps_f by 0.0 will zero out all log-probabilities, causing metrics like policy_confidence to always be 1.0 and approx_kl to be 0.0, regardless of the actual model output. You should add a guard to ensure scale is only applied if it is greater than zero.

Suggested change
scale = self.temperature
logps_f = logps.float()
if scale != 1.0:
logps_f = logps_f * scale
scale = self.temperature
logps_f = logps.float()
if scale > 0.0 and scale != 1.0:
logps_f = logps_f * scale

from twinkle.data_format.sampling import SamplingParams

# CJK worst case ~2 tokens/char; budget is a soft char ceiling, not output truth.
max_new = max(256, budget * 2 + 128)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heuristic for max_tokens calculation (budget * 2 + 128) might be too tight for certain languages or complex summary formats, especially when using Chain-of-Thought or detailed markdown. Consider making this multiplier or the buffer constant configurable, or use a more conservative estimate to avoid premature truncation of the summary.

Comment on lines +435 to +442
if not s_after.startswith(s_before):
raise RuntimeError('Canonical chat_template output for messages_after is not a '
'prefix-extension of messages_before; cannot compute bridge '
'delta. This indicates the template is non-monotonic in the '
'message list (e.g. reorders / rewrites earlier turns).\n'
f's_before tail: {s_before[-80:]!r}\n'
f's_after at same offset: '
f'{s_after[max(0, len(s_before) - 80):len(s_before) + 80]!r}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bridge text computation relies on s_after.startswith(s_before). This assumes that the chat template is monotonic (i.e., adding a message only appends text and doesn't rewrite previous turns). While most modern templates (ChatML, Llama-3) are monotonic, some older or custom templates might inject system instructions or format headers differently based on the total number of messages. It would be safer to log the actual strings if this assertion fails to help users debug template issues.

…ti-turn-rollout

# Conflicts:
#	src/twinkle/template/base.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants