Skip to content

Type-check rfd3na input_parsing#354

Open
lyskov-ai wants to merge 20 commits into
RosettaCommons:productionfrom
lyskov-ai:0079-rfd3na-mypy-input-parsing
Open

Type-check rfd3na input_parsing#354
lyskov-ai wants to merge 20 commits into
RosettaCommons:productionfrom
lyskov-ai:0079-rfd3na-mypy-input-parsing

Conversation

@lyskov-ai

Copy link
Copy Markdown
Contributor

Continues clearing the models/rfd3na mypy ratchet — input_parsing, leaving just fabric_trainer. Mirrors the fix already landed for the rfd3 module, and depends on the previous PR (which restored _check_has_backbone_connections_to_nonstandard_residues).

  • Adds four missing imports (all NameErrors): create_o_atoms/create_cb_atoms from rfd3na.utils.inference, fetch_mask_from_idx from foundry.utils.components, and _check_has_backbone_connections_to_nonstandard_residues from rfd3na.inference.legacy_input_parsing.
  • Fixes create_motif_residue's error message to reference the token's own chain_id/res_id (the old src_chain/src_resid were undefined).
  • accumulate_components: narrow components_to_accumulate to List[str] (the int was spurious — both data sources are list[str]) and unindexed_breaks to List[bool | None].
  • Rewrite a list-comprehension-for-side-effects as a set loop; data: dict | None; fix a componentscomponent typo.

rfd3na is cluster-coupled; the NameError fixes make reachable paths work but aren't exercised in CI.

Verification: ruff format/check clean; mypy Success (238 files); pytest 541 passed / 10 skipped.

lyskov and others added 20 commits June 30, 2026 19:29
Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors
ratchet. The errors were one dominant pattern: the dynamically-keyed `state`
bag was inferred as `dict[Any, Any] | int | None`, so every state access and
counter update errored.

Mirror the already-landed foundry trainers/fabric precedent: a class-level
`state: dict[str, Any]` + `_current_train_return: Any`, an annotated
`default_state`, a documented type-ignore on the wider str|int precision API,
a cast on setup_dataloaders + widening the loop params to DataLoader,
`get_latest_checkpoint -> Path | None` with a cast at the call site, and the
truthful `load_legacy_checkpoint -> None`.

Behaviour-preserving, mypy-only (no clean CPU-test target for this
cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove
it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize).

- 11 implicit-Optional parameter fixes in
  create_atom_array_from_design_specification_legacy (incl. unfix_specific ->
  str | list | None, honest to the body's isinstance branches).
- The Optional fixes cascade union-attr/arg-type errors at exists()-guarded
  sites; since foundry.common.exists is not a TypeGuard it does not narrow, so
  the narrowing-required guards are converted to behaviour-identical
  'x is not None' checks (exists(x) is by definition 'x is not None').
- unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...'
  unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping
  the '_' that collided with the **_ kwargs parameter.
- Documented cast(str, contig) (contig is always provided or defaulted to the
  non-optional length, which mypy cannot follow through exists()).
- Fix a real reachable NameError: import the missing
  spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch
  is config-gated and would crash today).

mypy-only slice; no behaviour change on production paths. All gates green
(ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Fix the pre-existing bugs that held the last two models/rfd3 modules on the
ignore_errors ratchet, then remove the rfd3 override block entirely. All of
models/rfd3 now type-checks with no module-level exemptions.

engine.py:
- run(): documented `# type: ignore[override]` + honest return type
  `dict[str, list[RFD3Output]] | None` for the keyword-only override of the
  positional base BaseInferenceEngine.run().
- _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None`
  so mypy narrows it for find_files_with_extension().
- normalize_inputs: early returns + a documented cast for the str-split branch
  (list is invariant).
- process_input: delete the dead/redundant block that re-did normalize_inputs'
  work and called .split(',') on a value that is always a list; narrow the two
  loop guards to `input is not None`.

io.py: widen find_files_with_extension's supported_file_types to `set | list`
(matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set).

vizualize.py (dev script, imported nowhere): drop the broken import+call of
_add_design_annotations_from_cif_block_metadata (removed in the open-sourcing
refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure
.cif/.bcif loading; document the atomworks C_DIS .as_array() ignore.

Add unit tests for the pure normalize_inputs helper.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module
ignore_errors ratchet for the 10 modules that currently have type errors,
so the gate stays green while they are cleared one slice per task. The
other 14 mpnn modules type-check clean under the lenient baseline and are
now gated. Config-only; no pytest wiring (mpnn's existing test suite is
partly cluster-coupled / library-drift-broken on CPU).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/tests to testpaths with a collect_ignore denylist in its
conftest. Most of the existing mpnn suite loads structures via atomworks
cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run
locally on the cluster; the CPU-portable files are collected. Fix two
stale numpy-bool identity assertions (np.True_ is True) in
test_polymer_ligand_interface.py so it is green and collectable. Adds 28
mpnn CPU tests to the gate (520 passed, was 492).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and
mpnn.transforms.feature_aggregation.user_settings, and drop them from the
ignore_errors ratchet. Annotation/comment-only: narrow the inherited
BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation;
a documented type: ignore[arg-type] for the DictConfig **-unpack (per the
RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the
refined validation_step 3rd param (per rf3); and a type: ignore[misc] for
an atomworks AnnotationList2D row unpack its stub doesn't express.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.transforms.feature_aggregation.token_encodings,
mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from
the ignore_errors ratchet. token_encodings wraps its TokenEncoding
token_atoms values in np.asarray (matching the field's ndarray type, which
__post_init__ already produces); pipelines.mpnn fixes an implicit-Optional
model_type with a narrowing guard + list() for a tuple arg; the inference
engine narrows _absolute_path_or_none's Optional result and the
atom_arrays/input_dicts guard with documented asserts. Adds
test_pipeline_validation.py (6 CPU tests) for the model_type validation.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
feature_collator: two implicit-Optional default_padding params + a list
var annotation (behaviour-preserving). train: replace the ad-hoc local
CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit
actually expects) — CkptConfig lacked parameter_freezing_config, which
fit() reads, so the checkpoint-resume branch would have raised
AttributeError; the swap fixes that latent footgun. Remaining train
errors are third-party stub gaps (atomworks StructuralDatasetWrapper types
dataset_parser as Callable but takes a parser object; torch
WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) —
documented type: ignore[arg-type].

train.py is a cluster-only training script (unverified in CI); the
CheckpointConfig behaviour improvement is flagged for mpnn owners.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
load_legacy_weights reads model-specific int config dims (num_rbf,
num_backbone_atoms, ...) off model.graph_featurization_module, but
nn.Module.__getattr__ types submodule access as Tensor | Module, so those
ints are invisible to the checker. Add a small typed helper
_int_config_attr(module, name) -> int wrapping getattr and route the reads
through it (behaviour-identical; the concrete featurization class varies,
so casting to a single class wouldn't cover all reads). Also annotate the
attribute-walk local as torch.nn.Module | None. mypy-only: the loader
needs a real model + legacy checkpoint (cluster-only).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the last exempted mpnn module (utils.inference, 15 errors) and
remove the ignore_errors ratchet block entirely, so all of models/mpnn
type-checks with no module-level exemptions. Fixes: narrow two guarded
Optionals with asserts (config path in cli_to_json, base_path in
write_fasta); type write_structure's file_type as
Literal['cif','bcif','cif.gz'] (the type to_cif_file expects, matching the
docstring; the sole caller uses the default); a categories dict
annotation; and two documented type: ignore[arg-type] for atomworks stub
gaps (**-unpacking a heterogeneous parser-args dict into parse(); passing
ndarrays to set_annotation_2d which is typed Sequence).

Adds test_inference_helpers.py (15 CPU tests) for the pure parsing helpers
(str2bool, none_or_type, parse_json_like, parse_list_like,
_absolute_path_or_none).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/rfd3na/src/rfd3na to mypy's files and seed a fresh per-module
ignore_errors ratchet for the 35 modules that currently have type errors,
so the gate stays green while they are cleared one slice per task. The
other 27 rfd3na modules type-check clean under the lenient baseline and
are now gated. Config-only; no pytest wiring (rfd3na's existing tests are
cluster-coupled like rfd3's). rfd3na is a near-copy of rfd3, so several
modules carry the same defects rfd3 already fixed.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 8 rfd3na modules (one error each) and drop them from the
ignore_errors ratchet. Mostly annotation-only: a LongTensor cast on
.long(); three empty-dict var annotations; a numpy.typing NDArray for a
pydantic bool-mask field; an assert narrowing a cache attr; and a
type: ignore[list-item] for the Transform-class requires_previous_transforms
pattern (mirroring rfd3). Also corrects a stale return annotation on
hbonds_hbplus.calculate_hbonds (it was copied from the different
calculate_hbonds in hbonds.py; the real return is (AtomArray, list, int),
matching its callers).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 10 more rfd3na modules and drop them from the ignore_errors
ratchet (nucleic_ss_metrics had only informational notes). Most mirror the
already-landed rfd3 fixes: implicit-Optional params; a str|None base
attr; an isinstance re-narrow after an untyped call; PathLike|None +
transform assert; crop-center int|float decls + a cast; type: ignore for
the Transform-class requires_previous_transforms and the DictConfig/None
resize arg.

Two genuine correctness fixes (both matching rfd3): correct
permute_symmetric_atom_names_'s signature to np.ndarray (it does ndarray
indexing and callers pass ndarrays); and drop vizualize's broken import of
_add_design_annotations_from_cif_block_metadata (deleted in open-sourcing,
exists nowhere), which would ImportError on the .cif-load path. Also add a
ValueError guard for an unknown dna_crop protein_contact_type (rfd3 0058).
rfd3na is cluster-coupled, so these are unverified in CI.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…om the ratchet

Type-check four more rfd3na modules, mirroring the already-landed rfd3
fixes: utils/io (Path wraps, cast(Tensor) on weighted_rigid_align, a
list[Path] annotation, create_example_id_extractor's Callable return type,
and a coords-var trajectory rewrite); trainer_utils (rebind np.array to a
distinct atom_names_arr + a cast(dict) for the heterogeneous
association_schemes constant); dump_validation_structures (epoch: str|None
+ type: ignore[override]); run_inference (cast OmegaConf.to_container to
dict[str, Any]). All behaviour-preserving; mypy-only.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check four more rfd3na modules: na_geom (implicit-Optional +
partner_sym_map assert), model/RFD3 (DictConfig **-unpack ignores +
coord_atom_lvl_to_be_noised Tensor|None with an inference-branch assert),
design_transforms (type: ignore[list-item] for the Transform-class
requires_previous_transforms), pipelines (list() wrap; fix the wrong
residue_cache_dir: bool to str|Path|None; a required-in-practice
association_scheme ignore; drop an orphaned import). Also correct 0072's
util_transforms.encode_residues_to from int|None to str -- pipelines
passes the GAP string token, matching rfd3's own correction.

All behaviour-preserving; mypy-only.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…e ratchet

engine gets the full rfd3 treatment: a keyword-only run override ignore,
a normalize_inputs rewrite with a documented cast, deletion of the
dead .split-on-a-list block in process_input, and out_dir/input
is-not-None narrows -- closing the pre-filed rfd3na-engine finding. This
cascades two correct param widenings (io.find_files_with_extension to
set|list; input_parsing.ensure_input_is_abspath to str|PathLike|None).

inference_sampler: partial_t is a tensor (body calls .mean()), so type it
Tensor|None and use a distinct float scalar downstream; assert the
CFG-only ref_initializer_outputs; give gamma_0 a .get default so a missing
value is a clean assert, not a None>float TypeError.

trainer/rfd3na: cast(int) for the recycle count, type: ignore for
Loss(**loss) and validation_step override, an assert the loss is set, and
a dict annotation. All behaviour-preserving; mypy-only.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check three more rfd3na modules, fixing real defects surfaced by
honest typing: two NameErrors in utils.inference (logger -> ranked_logger;
an undefined scale -> ori_jitter in the jitter block) and np.array used as
a type annotation (-> np.ndarray). hbonds: implicit-Optional selection
masks, Tuple[str] -> Tuple[str, ...] defaults, and a .sum()-on-None guard
(None mask = no selection limit). na_geom_utils: cast the heterogeneous
ATOM_REGION_BY_RESI constant, correct make_coord_list's signature (it
returns coordinate floats and its caller passes list[int]/list[str|None]),
and drop redundant re-annotations.

The NameError fixes make previously-crashing cluster-only paths work;
rfd3na is cluster-coupled, so unverified in CI. mypy-only otherwise.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
The association_schemes constant is a heterogeneous nested-dict structure
built up across the module with .copy()/|=/indexed-assignment/.items(), so
mypy inferred its values as object and every such operation failed.
Annotate it dict[str, Any] -- one change clears all 14 errors and keeps
the earlier cast(dict, association_schemes[scheme]) uses non-redundant.
Data-only module; annotation-only.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…om rfd3

Mirrors the rfd3 fix: 11 implicit-Optional params in
create_atom_array_from_design_specification_legacy with the cascaded
union-attr/arg-type errors narrowed via is-not-None + a cast(str, contig)
+ fixed_atoms or {}; a list-comp-for-side-effects rewritten as a loop;
unfix_residues annotation + [0] unpack; cast(list, backbone_atoms_*); and
the missing spoof_helical_bundle_ss_conditioning_fn import (was a
NameError).

Also ports _check_has_backbone_connections_to_nonstandard_residues from
rfd3's legacy_input_parsing (it was dropped in the rfd3na fork) so the
next slice can clear input_parsing, which imports it.

rfd3na is cluster-coupled; mypy-only otherwise.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Now unblocked by the 0078 port of _check_has_backbone_connections_to_
nonstandard_residues. Mirrors the rfd3 fix: add the four missing imports
(create_o_atoms/create_cb_atoms, fetch_mask_from_idx, and the ported
_check_has_backbone helper -- all NameErrors); fix create_motif_residue's
error message to use the token's own chain_id/res_id (was undefined
src_chain/src_resid); narrow accumulate_components' components to
List[str] (the int was spurious) and unindexed_breaks to List[bool|None];
rewrite a list-comp-for-side-effects as a set loop; data: dict|None; and a
components->component typo.

rfd3na is cluster-coupled; the NameError fixes make reachable paths work
but are unverified in CI. mypy-only otherwise.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
@lyskov-ai lyskov-ai requested a review from woodsh17 July 1, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants