Skip to content

[Feat]: add NPU fused operators (RMSNorm, RoPE, SwiGLU, SDPA)#194

Merged
tastelikefeet merged 14 commits into
modelscope:mainfrom
ys2025-AI:main
May 19, 2026
Merged

[Feat]: add NPU fused operators (RMSNorm, RoPE, SwiGLU, SDPA)#194
tastelikefeet merged 14 commits into
modelscope:mainfrom
ys2025-AI:main

Conversation

@ys2025-AI
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Extends Twinkle's NPU support from a basic MoE GMM patch to a full fused-operator suite (RMSNorm, RoPE, SwiGLU, SDPA) for Ascend hardware.

Experiment results

Atlas 900 A2 (8× NPU) | Qwen3-30B-A3B-Instruct-2507 | LoRA r=8, batch=16, 188 steps | Dataset GSM8K_ZH

Metric Baseline This PR Delta
Total 544 s 503 s +7.5%
Training (step 10–180) 465 s 404 s +13.1%
Loss / GradNorm << 0.01

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive NPU hardware acceleration support for Ascend devices by implementing fused operators (RMSNorm, RoPE, SwiGLU, and SDPA) and monkey-patching logic for specific model families like Qwen. It also refactors the NPU patching mechanism to be applied automatically when an NPU device is detected. Review feedback focuses on improving error handling by logging tracebacks for broad exception catches and restoring type hints and assertions that were removed during the refactoring of the MoE grouped matrix multiplication functions.

Comment thread src/twinkle/kernel/__init__.py Outdated
Comment thread src/twinkle/kernel/__init__.py Outdated
Comment thread src/twinkle/kernel/monkey_patch_npu.py Outdated
Comment thread src/twinkle/kernel/monkey_patch_npu.py Outdated
ys2025-AI and others added 3 commits May 18, 2026 20:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Comment thread src/twinkle/kernel/__init__.py Outdated
Copy link
Copy Markdown
Collaborator Author

@ys2025-AI ys2025-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为在 kernel/init.py 中直接复用 Torch.is_npu_available()

Copy link
Copy Markdown
Collaborator Author

@ys2025-AI ys2025-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经通过pre-commit检查

@tastelikefeet tastelikefeet merged commit d82ebb6 into modelscope:main May 19, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants