Multi turn rollout by tastelikefeet · Pull Request #193 · modelscope/twinkle

tastelikefeet · 2026-05-18T09:05:17Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Features

Add twinkle_agent package to hold multi-turn rollouts and tool callings
Add notifier to notify user when training is failed
Add GRPO entropy loss/ref logps loss
Add GRPO series metrics
Add a patch to qwen3 series models to fix a bug that special tokens will cause jinja encode error
Support sample with base model when enable_lora is True
Support lora model id used in sampling
Support Qwen models tool parsing and cleaning
Support selective_log_softmax returns entropy
Support api/vllm multi turn rollouts and trace saving
Support tool manager
[Experimental] Support passage chunker and condenser and tool

Bug Fix

Fix dataset multi-process preprocessing
Fix a bug that mm model templates cause cache failed

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces a comprehensive framework for multi-turn agentic rollouts with trajectory compression, including a new MultiTurnCondenseRollout class, chunking utilities, and a keyword-based condenser. It also adds support for per-token oracle bonuses, entropy bonuses, and improved logging diagnostics for GRPO training. Additionally, it patches the Qwen3 chat template to resolve parsing issues with orphan </think> tags and includes a dataset builder for condensed SFT training.

…ti-turn-rollout # Conflicts: # src/twinkle/template/__init__.py # src/twinkle/template/base.py # src/twinkle/utils/transformers_utils.py

tastelikefeet · 2026-05-19T05:52:16Z

/gemini review

gemini-code-assist

Code Review

This pull request aligns the message and tool-call data formats with the OpenAI chat-completion schema and introduces the twinkle_agentic package to support multi-turn agentic rollouts with tool-use, chunking, and passage condensation. Infrastructure enhancements include a new exception notification system, support for entropy bonuses in GRPO loss, and robustness patches for Qwen3 chat templates. The review identifies a critical bug where zero temperature results in zeroed log-probability metrics and notes potential side effects from module-level multiprocessing configuration. Further feedback suggests refining token limit heuristics and improving template monotonicity assertions with better logging.

gemini-code-assist · 2026-05-19T05:55:14Z

+        scale = self.temperature
+        logps_f = logps.float()
+        if scale != 1.0:
+            logps_f = logps_f * scale


If self.temperature is set to 0.0 (which is common for greedy sampling), scale becomes 0.0. Multiplying logps_f by 0.0 will zero out all log-probabilities, causing metrics like policy_confidence to always be 1.0 and approx_kl to be 0.0, regardless of the actual model output. You should add a guard to ensure scale is only applied if it is greater than zero.

Suggested change

scale = self.temperature

logps_f = logps.float()

if scale != 1.0:

logps_f = logps_f * scale

scale = self.temperature

logps_f = logps.float()

if scale > 0.0 and scale != 1.0:

logps_f = logps_f * scale

gemini-code-assist · 2026-05-19T05:55:14Z

+        from twinkle.data_format.sampling import SamplingParams
+
+        # CJK worst case ~2 tokens/char; budget is a soft char ceiling, not output truth.
+        max_new = max(256, budget * 2 + 128)


The heuristic for max_tokens calculation (budget * 2 + 128) might be too tight for certain languages or complex summary formats, especially when using Chain-of-Thought or detailed markdown. Consider making this multiplier or the buffer constant configurable, or use a more conservative estimate to avoid premature truncation of the summary.

gemini-code-assist · 2026-05-19T05:55:14Z

+        if not s_after.startswith(s_before):
+            raise RuntimeError('Canonical chat_template output for messages_after is not a '
+                               'prefix-extension of messages_before; cannot compute bridge '
+                               'delta. This indicates the template is non-monotonic in the '
+                               'message list (e.g. reorders / rewrites earlier turns).\n'
+                               f's_before tail: {s_before[-80:]!r}\n'
+                               f's_after at same offset: '
+                               f'{s_after[max(0, len(s_before) - 80):len(s_before) + 80]!r}')


The bridge text computation relies on s_after.startswith(s_before). This assumes that the chat template is monotonic (i.e., adding a message only appends text and doesn't rewrite previous turns). While most modern templates (ChatML, Llama-3) are monotonic, some older or custom templates might inject system instructions or format headers differently based on the total number of messages. It would be safer to log the actual strings if this assertion fails to help users debug template issues.

…ti-turn-rollout # Conflicts: # src/twinkle/template/base.py

tastelikefeet added 30 commits May 9, 2026 11:52

wip

6aade99

wip

99394a2

wip

27cd090

fix

9e31c07

fix

33b8b32

fix

bbed39d

fix

504cfa0

fix

2393272

fix

5b731ea

fix

7576ef7

fix

eb85331

fix

1c0a093

fix

af4a892

fix

04565b6

wip

95d47f4

fix

88ceb1d

fix

e14e582

fix

56182f3

fix

e4dee4a

fix

f728a8d

fix

1ee5235

fix

2bfda3d

fix

b6f6b8b

fix

73d828b

fix

7cb1845

fix

34e6b44

fix

ce46d94

fix

e0e836e

fix

5ab035b

fix

e265980

tastelikefeet added 6 commits May 17, 2026 23:23

fix

f8c7129

fix

519afd7

fix

aba84b2

fix

ea32a03

revert files

c357b83

revert files

a9dad48

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

tastelikefeet added 16 commits May 18, 2026 19:58

fix

8dca215

fix

f299ae4

fix

9494a6c

fix

bfe3838

fix

75484e2

fix

573812b

fix

200bb57

fix

b0d0fe2

fix

1bd27e6

fix

52bcf0e

fix

2260906

fix

3c8f04e

fix

f30ffe8

lint

b8f0f0a

Merge commit '513a625b913790a3cfb1d3bf8b706dc44a1f89a4' into feat/mul…

46f38e0

…ti-turn-rollout # Conflicts: # src/twinkle/template/__init__.py # src/twinkle/template/base.py # src/twinkle/utils/transformers_utils.py

lint code

f98c9ea

tastelikefeet changed the title ~~[WIP]Multi turn rollout~~ Multi turn rollout May 18, 2026

hjh0119 approved these changes May 18, 2026

View reviewed changes

Comment thread src/twinkle/loss/grpo.py Outdated

Comment thread src/twinkle/utils/torch_utils.py

tpx818 reviewed May 18, 2026

View reviewed changes

Comment thread src/twinkle_agentic/rollout/api_multi_turn.py Outdated

fix

7b0df16

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Merge commit 'adc71eb2ef54720dbf538e4215ed9271250300d6' into feat/mul…

1d08834

…ti-turn-rollout # Conflicts: # src/twinkle/template/base.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi turn rollout#193

Multi turn rollout#193
tastelikefeet wants to merge 56 commits into
modelscope:mainfrom
tastelikefeet:feat/multi-turn-rollout

tastelikefeet commented May 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tastelikefeet commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tastelikefeet commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Features

Bug Fix

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tastelikefeet commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tastelikefeet commented May 18, 2026 •

edited

Loading