Type-check the mpnn feature-collator and train modules by lyskov-ai · Pull Request #342 · RosettaCommons/foundry

lyskov-ai · 2026-07-01T00:12:38Z

Continues bringing models/mpnn under the mypy gate. Clears collate.feature_collator and train, leaving 2 mpnn modules (utils.inference, utils.weights).

feature_collator.py: fix two implicit-Optional default_padding parameters (the body already maps None to the MPNN defaults) and annotate a list local. Annotation-only; covered by the existing test_feature_collator.py.
train.py: replace the ad-hoc local CkptConfig class with foundry's CheckpointConfig — the type FabricTrainer.fit actually expects. CkptConfig was a field-subset of it and, notably, lacked parameter_freezing_config, which fit() reads — so the checkpoint-resume branch would have raised AttributeError. The swap fixes that latent footgun. The remaining train.py errors are third-party stub gaps documented with type: ignore[arg-type]: atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a GenericDFParser parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub.

Note: train.py is a cluster-only training script (hardcoded paths, real data) not run in CI, so the CheckpointConfig behaviour improvement is unverified here — flagged for mpnn owners.

Verification: ruff format/check clean; mypy Success (176 files); pytest 526 passed / 10 skipped.

Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors ratchet. The errors were one dominant pattern: the dynamically-keyed `state` bag was inferred as `dict[Any, Any] | int | None`, so every state access and counter update errored. Mirror the already-landed foundry trainers/fabric precedent: a class-level `state: dict[str, Any]` + `_current_train_return: Any`, an annotated `default_state`, a documented type-ignore on the wider str|int precision API, a cast on setup_dataloaders + widening the loop params to DataLoader, `get_latest_checkpoint -> Path | None` with a cast at the call site, and the truthful `load_legacy_checkpoint -> None`. Behaviour-preserving, mypy-only (no clean CPU-test target for this cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize). - 11 implicit-Optional parameter fixes in create_atom_array_from_design_specification_legacy (incl. unfix_specific -> str | list | None, honest to the body's isinstance branches). - The Optional fixes cascade union-attr/arg-type errors at exists()-guarded sites; since foundry.common.exists is not a TypeGuard it does not narrow, so the narrowing-required guards are converted to behaviour-identical 'x is not None' checks (exists(x) is by definition 'x is not None'). - unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...' unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping the '_' that collided with the **_ kwargs parameter. - Documented cast(str, contig) (contig is always provided or defaulted to the non-optional length, which mypy cannot follow through exists()). - Fix a real reachable NameError: import the missing spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch is config-gated and would crash today). mypy-only slice; no behaviour change on production paths. All gates green (ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Fix the pre-existing bugs that held the last two models/rfd3 modules on the ignore_errors ratchet, then remove the rfd3 override block entirely. All of models/rfd3 now type-checks with no module-level exemptions. engine.py: - run(): documented `# type: ignore[override]` + honest return type `dict[str, list[RFD3Output]] | None` for the keyword-only override of the positional base BaseInferenceEngine.run(). - _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None` so mypy narrows it for find_files_with_extension(). - normalize_inputs: early returns + a documented cast for the str-split branch (list is invariant). - process_input: delete the dead/redundant block that re-did normalize_inputs' work and called .split(',') on a value that is always a list; narrow the two loop guards to `input is not None`. io.py: widen find_files_with_extension's supported_file_types to `set | list` (matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set). vizualize.py (dev script, imported nowhere): drop the broken import+call of _add_design_annotations_from_cif_block_metadata (removed in the open-sourcing refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure .cif/.bcif loading; document the atomworks C_DIS .as_array() ignore. Add unit tests for the pure normalize_inputs helper. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module ignore_errors ratchet for the 10 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 14 mpnn modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (mpnn's existing test suite is partly cluster-coupled / library-drift-broken on CPU). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/tests to testpaths with a collect_ignore denylist in its conftest. Most of the existing mpnn suite loads structures via atomworks cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run locally on the cluster; the CPU-portable files are collected. Fix two stale numpy-bool identity assertions (np.True_ is True) in test_polymer_ligand_interface.py so it is green and collectable. Adds 28 mpnn CPU tests to the gate (520 passed, was 492). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and mpnn.transforms.feature_aggregation.user_settings, and drop them from the ignore_errors ratchet. Annotation/comment-only: narrow the inherited BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation; a documented type: ignore[arg-type] for the DictConfig **-unpack (per the RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the refined validation_step 3rd param (per rf3); and a type: ignore[misc] for an atomworks AnnotationList2D row unpack its stub doesn't express. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.transforms.feature_aggregation.token_encodings, mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from the ignore_errors ratchet. token_encodings wraps its TokenEncoding token_atoms values in np.asarray (matching the field's ndarray type, which __post_init__ already produces); pipelines.mpnn fixes an implicit-Optional model_type with a narrowing guard + list() for a tuple arg; the inference engine narrows _absolute_path_or_none's Optional result and the atom_arrays/input_dicts guard with documented asserts. Adds test_pipeline_validation.py (6 CPU tests) for the model_type validation. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

feature_collator: two implicit-Optional default_padding params + a list var annotation (behaviour-preserving). train: replace the ad-hoc local CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit actually expects) — CkptConfig lacked parameter_freezing_config, which fit() reads, so the checkpoint-resume branch would have raised AttributeError; the swap fixes that latent footgun. Remaining train errors are third-party stub gaps (atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) — documented type: ignore[arg-type]. train.py is a cluster-only training script (unverified in CI); the CheckpointConfig behaviour improvement is flagged for mpnn owners. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

lyskov and others added 8 commits June 30, 2026 19:29

lyskov-ai requested a review from woodsh17 July 1, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type-check the mpnn feature-collator and train modules#342

Type-check the mpnn feature-collator and train modules#342
lyskov-ai wants to merge 8 commits into
RosettaCommons:productionfrom
lyskov-ai:0067-mpnn-mypy-collator-train

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants