Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
therealdavidos
left a comment
There was a problem hiding this comment.
looks good from the math perspective!
There was a problem hiding this comment.
is this change related to the yarn PR?
There was a problem hiding this comment.
No, it isn't related to Yarn. I noticed this test was failing and fixed it. I can open a separate PR for the test fix if needed, but I'm not sure it's worth the extra overhead.
There was a problem hiding this comment.
nah thats fine. thanks for the clarification
|
|
||
| self.reset_parameters() | ||
|
|
||
| def _compute_yarn_parameters(self, device: torch.device | None) -> tuple[torch.Tensor, float]: |
There was a problem hiding this comment.
Pleace place private methods below the public interface of the class.
| seq_length_dim: Annotated[int, Field(strict=True)] | ||
| base_freq: Annotated[int, Field(strict=True, ge=10000)] | ||
| max_position_embeddings: Optional[Annotated[int, Field(strict=True, ge=1)]] = None | ||
| rope_scaling: Optional[dict[str, object]] = None |
There was a problem hiding this comment.
Does this play nicely with our config setup? Would it be possible to have something like "rope_scaling: YarnSettings | DefaultSettingsIfExists | SomeFutureRopeScalingSettings | None = None" with the Settings being BaseModels themselves?
There was a problem hiding this comment.
Good point. I added Configs based on BaseModel in b91762a
| beta_fast_raw = self.rope_scaling.get("beta_fast") | ||
| beta_slow_raw = self.rope_scaling.get("beta_slow") | ||
| beta_fast = float(beta_fast_raw) if isinstance(beta_fast_raw, (int, float)) else 32.0 | ||
| beta_slow = float(beta_slow_raw) if isinstance(beta_slow_raw, (int, float)) else 1.0 |
There was a problem hiding this comment.
I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.
| return 0.1 * mscale * math.log(scale) + 1.0 | ||
|
|
||
| if attention_factor is None: | ||
| if isinstance(mscale, (int, float)) and isinstance(mscale_all_dim, (int, float)): |
There was a problem hiding this comment.
I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.
There was a problem hiding this comment.
nah thats fine. thanks for the clarification
What does this PR do?
This PR adds YaRN support to rotary position embeddings in the GPT-2 attention path.
General Changes
Breaking Changes
Checklist before submitting final PR
python tests/tests.py)CHANGELOG_DEV.md)