YaRN by rrutmann · Pull Request #445 · Modalities/modalities

rrutmann · 2026-05-11T14:04:11Z

What does this PR do?

This PR adds YaRN support to rotary position embeddings in the GPT-2 attention path.

General Changes

Implemented YaRN parameterization in rotary embeddings in gpt2_model.py
Added/updated YaRN configuration in config_lorem_ipsum_long_fsdp2_yarn.yaml
Refactored and strengthened rotary tests in test_rotary_qkv_transform.py

Breaking Changes

..

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

Co-authored-by: Copilot <copilot@github.com>

therealdavidos

looks good from the math perspective!

therealdavidos · 2026-05-18T15:32:25Z

is this change related to the yarn PR?

No, it isn't related to Yarn. I noticed this test was failing and fixed it. I can open a separate PR for the test fix if needed, but I'm not sure it's worth the extra overhead.

nah thats fine. thanks for the clarification

BlueCrescent · 2026-06-01T06:28:26Z


        self.reset_parameters()

+    def _compute_yarn_parameters(self, device: torch.device | None) -> tuple[torch.Tensor, float]:


Pleace place private methods below the public interface of the class.

I addressed this in e12db1a

BlueCrescent · 2026-06-01T08:37:26Z

            seq_length_dim: Annotated[int, Field(strict=True)]
            base_freq: Annotated[int, Field(strict=True, ge=10000)]
+            max_position_embeddings: Optional[Annotated[int, Field(strict=True, ge=1)]] = None
+            rope_scaling: Optional[dict[str, object]] = None


Does this play nicely with our config setup? Would it be possible to have something like "rope_scaling: YarnSettings | DefaultSettingsIfExists | SomeFutureRopeScalingSettings | None = None" with the Settings being BaseModels themselves?

Good point. I added Configs based on BaseModel in b91762a

BlueCrescent · 2026-06-01T08:46:07Z

+        beta_fast_raw = self.rope_scaling.get("beta_fast")
+        beta_slow_raw = self.rope_scaling.get("beta_slow")
+        beta_fast = float(beta_fast_raw) if isinstance(beta_fast_raw, (int, float)) else 32.0
+        beta_slow = float(beta_slow_raw) if isinstance(beta_slow_raw, (int, float)) else 1.0


I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.

I addressed this in 82019f1

BlueCrescent · 2026-06-01T08:57:40Z

+            return 0.1 * mscale * math.log(scale) + 1.0
+
+        if attention_factor is None:
+            if isinstance(mscale, (int, float)) and isinstance(mscale_all_dim, (int, float)):


I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.

I addressed this in 82019f1

therealdavidos · 2026-06-02T12:16:42Z

nah thats fine. thanks for the clarification

rrutmann and others added 5 commits May 11, 2026 12:49

feat: Implement context extension with yarn

6524b24

Co-authored-by: Copilot <copilot@github.com>

test: Add test for yarn

f87eabb

Co-authored-by: Copilot <copilot@github.com>

docs: Add type annotations

779e7c1

Co-authored-by: Copilot <copilot@github.com>

docs: Add docstrings

2126b0b

Co-authored-by: Copilot <copilot@github.com>

fix: Write to unique filenames

309d147

Co-authored-by: Copilot <copilot@github.com>

rrutmann requested a review from le1nux May 12, 2026 12:28

rrutmann self-assigned this May 12, 2026

le1nux requested a review from BlueCrescent May 13, 2026 10:06

rrutmann requested review from therealdavidos and removed request for BlueCrescent May 20, 2026 08:32

chore: Apply black formatter

a06d6b4

therealdavidos reviewed Jun 1, 2026

View reviewed changes

BlueCrescent reviewed Jun 1, 2026

View reviewed changes

rrutmann added 3 commits June 2, 2026 09:01

fix: validate yarn rope scaling inputs

82019f1

refactor: use typed rope scaling configs for rotary transform

b91762a

chore: Place private methods below the public interface

e12db1a

therealdavidos approved these changes Jun 2, 2026

View reviewed changes

BlueCrescent approved these changes Jun 2, 2026

View reviewed changes

rrutmann merged commit 7337fe4 into main Jun 2, 2026
3 checks passed


		self.reset_parameters()

		def _compute_yarn_parameters(self, device: torch.device \| None) -> tuple[torch.Tensor, float]:

Conversation

rrutmann commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

Uh oh!

therealdavidos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rrutmann Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rrutmann Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rrutmann commented May 11, 2026 •

edited

Loading

rrutmann Jun 2, 2026 •

edited

Loading

rrutmann Jun 2, 2026 •

edited

Loading