Type-check rfd3na io / trainer-utils / dump / run_inference#348
Open
lyskov-ai wants to merge 14 commits into
Open
Type-check rfd3na io / trainer-utils / dump / run_inference#348lyskov-ai wants to merge 14 commits into
lyskov-ai wants to merge 14 commits into
Conversation
Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors ratchet. The errors were one dominant pattern: the dynamically-keyed `state` bag was inferred as `dict[Any, Any] | int | None`, so every state access and counter update errored. Mirror the already-landed foundry trainers/fabric precedent: a class-level `state: dict[str, Any]` + `_current_train_return: Any`, an annotated `default_state`, a documented type-ignore on the wider str|int precision API, a cast on setup_dataloaders + widening the loop params to DataLoader, `get_latest_checkpoint -> Path | None` with a cast at the call site, and the truthful `load_legacy_checkpoint -> None`. Behaviour-preserving, mypy-only (no clean CPU-test target for this cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize). - 11 implicit-Optional parameter fixes in create_atom_array_from_design_specification_legacy (incl. unfix_specific -> str | list | None, honest to the body's isinstance branches). - The Optional fixes cascade union-attr/arg-type errors at exists()-guarded sites; since foundry.common.exists is not a TypeGuard it does not narrow, so the narrowing-required guards are converted to behaviour-identical 'x is not None' checks (exists(x) is by definition 'x is not None'). - unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...' unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping the '_' that collided with the **_ kwargs parameter. - Documented cast(str, contig) (contig is always provided or defaulted to the non-optional length, which mypy cannot follow through exists()). - Fix a real reachable NameError: import the missing spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch is config-gated and would crash today). mypy-only slice; no behaviour change on production paths. All gates green (ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Fix the pre-existing bugs that held the last two models/rfd3 modules on the
ignore_errors ratchet, then remove the rfd3 override block entirely. All of
models/rfd3 now type-checks with no module-level exemptions.
engine.py:
- run(): documented `# type: ignore[override]` + honest return type
`dict[str, list[RFD3Output]] | None` for the keyword-only override of the
positional base BaseInferenceEngine.run().
- _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None`
so mypy narrows it for find_files_with_extension().
- normalize_inputs: early returns + a documented cast for the str-split branch
(list is invariant).
- process_input: delete the dead/redundant block that re-did normalize_inputs'
work and called .split(',') on a value that is always a list; narrow the two
loop guards to `input is not None`.
io.py: widen find_files_with_extension's supported_file_types to `set | list`
(matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set).
vizualize.py (dev script, imported nowhere): drop the broken import+call of
_add_design_annotations_from_cif_block_metadata (removed in the open-sourcing
refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure
.cif/.bcif loading; document the atomworks C_DIS .as_array() ignore.
Add unit tests for the pure normalize_inputs helper.
Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module ignore_errors ratchet for the 10 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 14 mpnn modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (mpnn's existing test suite is partly cluster-coupled / library-drift-broken on CPU). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/tests to testpaths with a collect_ignore denylist in its conftest. Most of the existing mpnn suite loads structures via atomworks cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run locally on the cluster; the CPU-portable files are collected. Fix two stale numpy-bool identity assertions (np.True_ is True) in test_polymer_ligand_interface.py so it is green and collectable. Adds 28 mpnn CPU tests to the gate (520 passed, was 492). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and mpnn.transforms.feature_aggregation.user_settings, and drop them from the ignore_errors ratchet. Annotation/comment-only: narrow the inherited BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation; a documented type: ignore[arg-type] for the DictConfig **-unpack (per the RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the refined validation_step 3rd param (per rf3); and a type: ignore[misc] for an atomworks AnnotationList2D row unpack its stub doesn't express. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.transforms.feature_aggregation.token_encodings, mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from the ignore_errors ratchet. token_encodings wraps its TokenEncoding token_atoms values in np.asarray (matching the field's ndarray type, which __post_init__ already produces); pipelines.mpnn fixes an implicit-Optional model_type with a narrowing guard + list() for a tuple arg; the inference engine narrows _absolute_path_or_none's Optional result and the atom_arrays/input_dicts guard with documented asserts. Adds test_pipeline_validation.py (6 CPU tests) for the model_type validation. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
feature_collator: two implicit-Optional default_padding params + a list var annotation (behaviour-preserving). train: replace the ad-hoc local CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit actually expects) — CkptConfig lacked parameter_freezing_config, which fit() reads, so the checkpoint-resume branch would have raised AttributeError; the swap fixes that latent footgun. Remaining train errors are third-party stub gaps (atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) — documented type: ignore[arg-type]. train.py is a cluster-only training script (unverified in CI); the CheckpointConfig behaviour improvement is flagged for mpnn owners. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
load_legacy_weights reads model-specific int config dims (num_rbf, num_backbone_atoms, ...) off model.graph_featurization_module, but nn.Module.__getattr__ types submodule access as Tensor | Module, so those ints are invisible to the checker. Add a small typed helper _int_config_attr(module, name) -> int wrapping getattr and route the reads through it (behaviour-identical; the concrete featurization class varies, so casting to a single class wouldn't cover all reads). Also annotate the attribute-walk local as torch.nn.Module | None. mypy-only: the loader needs a real model + legacy checkpoint (cluster-only). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the last exempted mpnn module (utils.inference, 15 errors) and remove the ignore_errors ratchet block entirely, so all of models/mpnn type-checks with no module-level exemptions. Fixes: narrow two guarded Optionals with asserts (config path in cli_to_json, base_path in write_fasta); type write_structure's file_type as Literal['cif','bcif','cif.gz'] (the type to_cif_file expects, matching the docstring; the sole caller uses the default); a categories dict annotation; and two documented type: ignore[arg-type] for atomworks stub gaps (**-unpacking a heterogeneous parser-args dict into parse(); passing ndarrays to set_annotation_2d which is typed Sequence). Adds test_inference_helpers.py (15 CPU tests) for the pure parsing helpers (str2bool, none_or_type, parse_json_like, parse_list_like, _absolute_path_or_none). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/rfd3na/src/rfd3na to mypy's files and seed a fresh per-module ignore_errors ratchet for the 35 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 27 rfd3na modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (rfd3na's existing tests are cluster-coupled like rfd3's). rfd3na is a near-copy of rfd3, so several modules carry the same defects rfd3 already fixed. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 8 rfd3na modules (one error each) and drop them from the ignore_errors ratchet. Mostly annotation-only: a LongTensor cast on .long(); three empty-dict var annotations; a numpy.typing NDArray for a pydantic bool-mask field; an assert narrowing a cache attr; and a type: ignore[list-item] for the Transform-class requires_previous_transforms pattern (mirroring rfd3). Also corrects a stale return annotation on hbonds_hbplus.calculate_hbonds (it was copied from the different calculate_hbonds in hbonds.py; the real return is (AtomArray, list, int), matching its callers). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 10 more rfd3na modules and drop them from the ignore_errors ratchet (nucleic_ss_metrics had only informational notes). Most mirror the already-landed rfd3 fixes: implicit-Optional params; a str|None base attr; an isinstance re-narrow after an untyped call; PathLike|None + transform assert; crop-center int|float decls + a cast; type: ignore for the Transform-class requires_previous_transforms and the DictConfig/None resize arg. Two genuine correctness fixes (both matching rfd3): correct permute_symmetric_atom_names_'s signature to np.ndarray (it does ndarray indexing and callers pass ndarrays); and drop vizualize's broken import of _add_design_annotations_from_cif_block_metadata (deleted in open-sourcing, exists nowhere), which would ImportError on the .cif-load path. Also add a ValueError guard for an unknown dna_crop protein_contact_type (rfd3 0058). rfd3na is cluster-coupled, so these are unverified in CI. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…om the ratchet Type-check four more rfd3na modules, mirroring the already-landed rfd3 fixes: utils/io (Path wraps, cast(Tensor) on weighted_rigid_align, a list[Path] annotation, create_example_id_extractor's Callable return type, and a coords-var trajectory rewrite); trainer_utils (rebind np.array to a distinct atom_names_arr + a cast(dict) for the heterogeneous association_schemes constant); dump_validation_structures (epoch: str|None + type: ignore[override]); run_inference (cast OmegaConf.to_container to dict[str, Any]). All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continues clearing the
models/rfd3namypy ratchet — four modules, leaving 14. Each mirrors the fix already landed for the near-identical rfd3 module.utils/io.py:Path(...)wraps on theto_cif_filepath args;castthe alignment inputs toTensor; annotate alist[Path]; correctcreate_example_id_extractor's return type toCallable[[PathLike], str](it returns a closure); and use a separatecoordsvariable in the trajectory loop instead of rebinding aTensorloop var to an ndarray.trainer/trainer_utils.py: bindnp.array(atom_names)to a distinct name (rebinding the list loop var to an ndarray confused its type); andcastthe selectedassociation_schemesentry todict(the constant has heterogeneous values, so mypy infersobject).trainer/dump_validation_structures.py:epoch: str | Noneand atype: ignore[override]for the keyword-dispatched Fabric hook.run_inference.py:castOmegaConf.to_container(...)todict[str, Any](its annotated return is a broad union).All behaviour-preserving;
rfd3nais cluster-coupled so these run locally, not in CI.Verification:
ruffformat/check clean;mypySuccess (238 files);pytest541 passed / 10 skipped.