update gated delta rule state layout by grimoire · Pull Request #4636 · InternLM/lmdeploy

grimoire · 2026-05-28T10:13:49Z

Transposed state is more friendly than default layout.

Copilot

Pull request overview

This PR adds support for a transposed recurrent-state memory layout in the CUDA fused gated-delta-rule kernel and updates Qwen3.5/Qwen3-Next integration to use that layout (intended to be more hardware-friendly than the previous default layout).

Changes:

Add transpose_state_layout support to the TileLang fused recurrent gated-delta-rule kernel, including shape validation and a specialized direct load/store path for [V, K] state layout.
Update Qwen3.5/Qwen3-Next recurrent state shapes and the GatedDelta call sites to use the transposed layout.
Extend kernel tests to cover transposed state, transposed state + state_indices, and transposed circular-buffer behavior; adjust one circular-buffer tolerance.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/pytorch/kernel/test_gated_delta_rule.py	Adds coverage for transposed-state layout (including `state_indices` and circular buffer) and adjusts tolerance.
lmdeploy/pytorch/nn/gated_delta.py	Always enables transposed-state layout when invoking gated-delta kernels.
lmdeploy/pytorch/kernels/cuda/gated_delta_rule.py	Implements transposed-state layout path and input shape validation in the fused recurrent kernel.
lmdeploy/pytorch/configurations/qwen3_next.py	Swaps recurrent state shape to `[V, K]` for Qwen3-Next.
lmdeploy/pytorch/configurations/qwen3_5.py	Swaps recurrent state shape to `[V, K]` for Qwen3.5 (including spec-decoding shape).
lmdeploy/pytorch/backends/gated_delta_rule.py	Extends backend API to accept `transpose_state_layout` (and `cache_seqlens` for fused recurrent).
lmdeploy/pytorch/backends/cuda/gated_delta_rule.py	Threads `transpose_state_layout` through CUDA backend calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        core_attn_out, last_state = self.chunk_func(
            q,
            k,
            v,
            g=g,
            beta=beta,
            scale=scale,
            initial_state=init_state,
            output_final_state=output_final_state,
            use_qk_l2norm_in_kernel=False,
            cu_seqlens=cu_seqlens,
+            transpose_state_layout=transpose_state_layout,
        )


update gated delta rule state layout

7f3a9fc

Copilot AI review requested due to automatic review settings May 28, 2026 10:13

Copilot started reviewing on behalf of grimoire May 28, 2026 10:14 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

lvhan028 requested review from CUHKSZzxy and RunningLeon May 29, 2026 04:45

lvhan028 added the improvement label May 29, 2026

Merge branch 'main' into change-gdr-state-layout

fe96d03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update gated delta rule state layout#4636

update gated delta rule state layout#4636
grimoire wants to merge 2 commits into
InternLM:mainfrom
grimoire:change-gdr-state-layout

grimoire commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grimoire commented May 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants