test(cardano): epoch-boundary resume reproductions + import-path evaluation (#1018)#1019
test(cardano): epoch-boundary resume reproductions + import-path evaluation (#1018)#1019scarmuega wants to merge 2 commits into
Conversation
…1018) Adds `tests/boundary_resume.rs`, a self-contained, deterministic harness that crosses real epoch boundaries with actual PoolState/AccountState/EpochState and interrupts the boundary two ways, classifying lead vs lag. Uses a shrunken-but-coherent genesis (epoch_length=100, byron k=1, f=0.05) so the randomness/stability windows land inside the epoch and RUPD/EWRAP/ESTART fire after a few hundred synthetic blocks, driven through ToyDomain. Tests: - baseline_clean_run_crosses_boundaries_aligned (passes): a clean run crosses >=2 boundaries with every entity aligned to EpochState.number. - import_crash_mid_estart_resumes_to_identical_state (passes): crash mid-ESTART (commit shards 0..16, no finalize) then resume yields byte-identical state. Refutes the import resume path as the lag source. - rollback_across_boundary_reapplies_to_identical_state (#[ignore], reproduces #1018): rollback across a boundary does NOT revert the boundary transition (EpochState.number stays advanced) because boundary transitions are not in the WAL; re-applying re-fires the boundary and double-advances every entity, silently. Un-ignore once boundary transitions are reversible on rollback. Runs in --release: the synthetic generator funds tx fees and registration deposits from un-pot-backed custom_utxos, which trips the unrelated pots.is_consistent debug_assert at a boundary (orthogonal to the snapshot- rotation logic under test). Debug mode skips with a message. Follow-up tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughA new Cargo integration test target ChangesEpoch-boundary resume/rollback integration test
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/boundary_resume.rs`:
- Around line 44-53: The require_release() function masks test failures as
passes in debug mode, hiding regression coverage. Remove the require_release()
function entirely and replace its runtime guard pattern with explicit
compile-time semantics. For all test functions that use require_release()
(including those at lines 236-240, 311-314, 374-376), add
#[cfg_attr(debug_assertions, ignore = "tests require --release")] attribute
above each test function declaration and remove the if !require_release() {
return; } guard statement from the test body.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 005bb5a1-ec80-4ff9-87b8-f19e0965c70d
📒 Files selected for processing (2)
Cargo.tomltests/boundary_resume.rs
| fn require_release() -> bool { | ||
| if cfg!(debug_assertions) { | ||
| eprintln!( | ||
| "SKIP: boundary_resume tests require --release \ | ||
| (synthetic pots debug_assert is unrelated to the snapshot-lag bug under test)" | ||
| ); | ||
| return false; | ||
| } | ||
| true | ||
| } |
There was a problem hiding this comment.
Release-only guard currently turns unexecuted tests into passing tests.
Returning early in debug mode marks these tests as pass, so default cargo test can silently miss this regression coverage. Prefer explicit ignore semantics (e.g., #[cfg_attr(debug_assertions, ignore = "...")]) and remove runtime early-return gating.
🔧 Proposed fix
-fn require_release() -> bool {
- if cfg!(debug_assertions) {
- eprintln!(
- "SKIP: boundary_resume tests require --release \
- (synthetic pots debug_assert is unrelated to the snapshot-lag bug under test)"
- );
- return false;
- }
- true
-}
-
#[test]
+#[cfg_attr(
+ debug_assertions,
+ ignore = "requires --release (synthetic pots debug_assert is unrelated to the snapshot-lag bug under test)"
+)]
fn baseline_clean_run_crosses_boundaries_aligned() {
- if !require_release() {
- return;
- }
let (blocks, domain) = synthetic_world(260);
@@
#[test]
+#[cfg_attr(
+ debug_assertions,
+ ignore = "requires --release (synthetic pots debug_assert is unrelated to the snapshot-lag bug under test)"
+)]
fn import_crash_mid_estart_resumes_to_identical_state() {
- if !require_release() {
- return;
- }
@@
#[test]
#[ignore = "reproduces `#1018`: rollback across an epoch boundary is not idempotent (un-ignore once fixed)"]
+#[cfg_attr(
+ debug_assertions,
+ ignore = "requires --release (synthetic pots debug_assert is unrelated to the snapshot-lag bug under test)"
+)]
fn rollback_across_boundary_reapplies_to_identical_state() {
- if !require_release() {
- return;
- }Also applies to: 236-240, 311-314, 374-376
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/boundary_resume.rs` around lines 44 - 53, The require_release()
function masks test failures as passes in debug mode, hiding regression
coverage. Remove the require_release() function entirely and replace its runtime
guard pattern with explicit compile-time semantics. For all test functions that
use require_release() (including those at lines 236-240, 311-314, 374-376), add
#[cfg_attr(debug_assertions, ignore = "tests require --release")] attribute
above each test function declaration and remove the if !require_release() {
return; } guard statement from the test body.
Extends the boundary-resume harness while evaluating import crash/resume safety across the full RUPD -> EWRAP -> ESTART boundary. Findings: - RUPD re-run is idempotent (PendingRewardState writes are overwrite-by-key, recompute is deterministic, finalize overwrites). - ESTART shard crash + resume is safe (shard-skip via estart_progress + atomic, guarded finalize) — Scenario A still passes, now also asserting pots. - EWRAP has a finalize-window gap: the last shard sets ewrap_progress.committed == total (EWrapProgress::apply), but EpochWrapUpV3 (which assembles the final EndStats and rotates rolling/pparams) commits separately and does NOT touch ewrap_progress. On restart, EwrapWorkUnit::initialize reads committed == total -> is_complete() -> skips BOTH shards and finalize. A crash in [last EWRAP shard committed, before EpochWrapUpV3 commits] therefore permanently skips EpochWrapUpV3 on resume, so ESTART reset consumes a non-finalized `end` (default epoch_incentives) and un-rotated rolling stats: reserves stay too high, treasury too low. Adds: - pots (reserves, treasury, utxos, rewards, fees) to the state fingerprint so monetary-accounting corruption is caught, not just per-entity epoch drift. - import_crash_ewrap_finalize_window_resumes_to_identical_state (#[ignore], reproduces the gap): crashes the 2nd EWRAP after all shards, before finalize, then resumes; pots diverge from a clean run by exactly the skipped epoch-1 incentives (~8.99e12 reserves/treasury). Un-ignore once the EWRAP finalize marker is fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Adds
tests/boundary_resume.rs, a self-contained, deterministic harness that crosses real epoch boundaries with actualPoolState/AccountState/EpochState, interrupts the boundary in several ways, and compares the resumed ledger against an uninterrupted run (epoch, per-entity snapshot epochs, and pots) — the reproduction called for in #1018's acceptance criteria, plus a detailed evaluation of the import crash/resume path.It uses a shrunken-but-coherent genesis (
epoch_length=100, byronk=1,f=0.05) so the randomness/stability windows land inside the epoch and RUPD/EWRAP/ESTART fire after a few hundred synthetic blocks, driven throughToyDomain.Tests
baseline_clean_run_crosses_boundaries_alignedEpochState.number.import_crash_mid_estart_resumes_to_identical_stateestart_progress+ atomic, guarded finalize).rollback_across_boundary_reapplies_to_identical_state#[ignore]import_crash_ewrap_finalize_window_resumes_to_identical_state#[ignore]Both
#[ignore]d tests assert the post-fix behavior (byte-identical resume), so CI stays green and they flip to passing once fixed. They run under--ignored.Bug 1 — rollback across a boundary (sync path)
Cross the slot-100 boundary into epoch 1,
rollbackto slot 80 (epoch 0), re-apply to tip:EpochState.numberis still 1 and the pool snapshot is still @1 — the boundary transition was not reverted, because boundary transitions are not written to the WAL (only ROLL block deltas are; EWRAP/RUPD/ESTART use the default no-opcommit_wal, androllbackincore/sync.rsonly undoes WAL deltas).Same WAL/rollback asymmetry as the reported pool-snapshot lag; the manifestation here is a lead. Fix: make boundary transitions reversible on rollback (WAL-backed or re-derive).
Bug 2 — import crash in the EWRAP finalize window
Found while evaluating import crash/resume safety across the full
RUPD → EWRAP → ESTARTboundary.ewrap_progress.committed == total(EWrapProgress::apply).EpochWrapUpV3— which assembles the finalEndStatsand rotates rolling/pparams — commits separately and does not touchewrap_progress.EwrapWorkUnit::initializereadscommitted == total→is_complete()→ skips both shards and finalize.So a crash in [last EWRAP shard committed → before
EpochWrapUpV3commits] permanently skipsEpochWrapUpV3on resume. ESTART'sdefine_new_potsthen reads a non-finalizedend(defaultepoch_incentives) and un-rotatedrolling, so the epoch's monetary expansion is never applied:14,982,005,400,001,109, treasury17,994,600,000,27714,991,000,000,000,000, treasury9,000,000,000,000→ reserves ~8.99e12 too high, treasury ~8.99e12 too low — exactly the skipped epoch-1 incentive split, silently (no guard/error). The off-by-one: the finalize-skip marker is advanced by the last shard, one commit before finalize runs. Fix: gate EWRAP's
is_completeon a marker set byEpochWrapUpV3itself.Import-path evaluation summary
PendingRewardStatewrites are overwrite-by-key, the recompute is deterministic (RUPD fires before its trigger block rolls), finalize overwrites.EpochBoundaryIncomplete) finalize; nois_completeskip.Notes
--release(matching theepoch_potsconvention). The synthetic block generator funds tx fees and registration deposits from un-pot-backedcustom_utxos, which trips the unrelatedpots.is_consistentdebug_assert at a boundary — a monetary-accounting invariant orthogonal to the snapshot-rotation logic under test. Debug mode skips with a message. Follow-up to make the generator boundary-safe: Synthetic block generator conjures value (unfunded fees/deposits) — breaks pots invariant across epoch boundaries #1020.🤖 Generated with Claude Code