Skip to content

Type-check rfd3 FabricTrainer (bring it under the mypy gate)#337

Open
lyskov-ai wants to merge 1 commit into
RosettaCommons:productionfrom
lyskov-ai:0060-rfd3-fabric-trainer-mypy
Open

Type-check rfd3 FabricTrainer (bring it under the mypy gate)#337
lyskov-ai wants to merge 1 commit into
RosettaCommons:productionfrom
lyskov-ai:0060-rfd3-fabric-trainer-mypy

Conversation

@lyskov-ai

Copy link
Copy Markdown
Contributor

rfd3's FabricTrainer (models/rfd3/src/rfd3/trainer/fabric_trainer.py) was exempted from mypy type-checking; this brings it under the gate.

Most of the errors stemmed from the trainer's state dictionary having no type annotation: mypy inferred its values as dict | int | None from the defaults literal, so every state["model"].… access and step/epoch counter update failed to type-check. The fixes mirror the equivalent, already-type-checked FabricTrainer in the shared layer (src/foundry/trainers/fabric.py):

  • class-level state: dict[str, Any] (a deliberately loose, dynamically-keyed bag) and _current_train_return: Any;
  • a type: ignore on the precision argument — our public str | int API is wider than Fabric's accepted literal union;
  • a cast on the setup_dataloaders results, and widened train_loop / validation_loop parameters to plain DataLoader;
  • get_latest_checkpoint return type corrected to Path | None (with a cast at its single call site);
  • load_legacy_checkpoint return type corrected to None — it updates state in place and never returned a value.

Annotations, casts, and corrected return types only — no behaviour change.

Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors
ratchet. The errors were one dominant pattern: the dynamically-keyed `state`
bag was inferred as `dict[Any, Any] | int | None`, so every state access and
counter update errored.

Mirror the already-landed foundry trainers/fabric precedent: a class-level
`state: dict[str, Any]` + `_current_train_return: Any`, an annotated
`default_state`, a documented type-ignore on the wider str|int precision API,
a cast on setup_dataloaders + widening the loop params to DataLoader,
`get_latest_checkpoint -> Path | None` with a cast at the call site, and the
truthful `load_legacy_checkpoint -> None`.

Behaviour-preserving, mypy-only (no clean CPU-test target for this
cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
@lyskov-ai lyskov-ai requested a review from woodsh17 June 30, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants