fix: move Release and delete out of BlockingCall callback to prevent SIGBUS on macOS by ludamad · Pull Request #21629 · AztecProtocol/aztec-packages

ludamad · 2026-03-16T19:04:12Z

Fixes SIGBUS crash on macOS in ThreadedAsyncOperation (#21138). Also targeting next via #21625.

Release() and delete op were inside the BlockingCall callback, which runs on the JS thread while BlockingCall is still blocked on the worker thread. Release() tears down TSFN internals (mutex/condvar) that BlockingCall needs to unwind, and delete destroys the member entirely. macOS unmaps freed pages aggressively → SIGBUS. Linux → silent use-after-free / segfault.

Fix: move both Release() and delete this to after BlockingCall returns on the worker thread.

Full post mortem with diagrams

dbanks12

This diff is large (needs rebase?), but the change to async_op.hpp lgtm

…SIGBUS on macOS Release() and delete were called inside the BlockingCall callback, which runs on the JS thread while BlockingCall is still blocked on the worker thread. Release() drops the TSFN refcount to 0, tearing down internal state that BlockingCall needs to unwind. delete destroys the TSFN member entirely. Both cause use-after-free when BlockingCall returns on the worker thread. macOS magazine malloc unmaps freed pages aggressively → SIGBUS. Fix: move both Release() and delete to after BlockingCall returns on the worker thread, where they can execute safely.

dbanks12 · 2026-05-19T16:46:26Z

Not closing quite yet, but marking as draft as we merged #21625 and claude thinks 21629 is still "racy"

AztecBot · 2026-05-22T14:38:19Z

This issue was automatically closed because it was referenced in PR #23469 which has been merged to the default branch.

View workflow run

…AztecProtocol#23469) ## Summary `aztec start --local-network` reliably SIGBUSes a few blocks into a run on macOS arm64 (since `v5.0.0-nightly.20260520`, i.e. after AztecProtocol#21625 shipped the `shared_ptr` use-after-free fix). This is a **different** fault from the one AztecProtocol#21625 fixed: a stack-guard violation (stack overflow) on a `nodejs_module.node` worker thread running AVM-simulation code, not a use-after-free. This pins an explicit, generous stack size on the `ThreadedAsyncOperation` worker thread. ## Root cause `ThreadedAsyncOperation::Queue()` (introduced in AztecProtocol#21138) runs the AVM simulation (`_fn`) directly on a bare `std::thread(...).detach()`. A `std::thread` uses the OS default stack for non-main threads, which is **512 KB on macOS** versus **8 MB on Linux**. The AVM-simulation call chain is deep enough to overflow 512 KB, so on macOS arm64 the worker writes into its stack-guard page and the process aborts with: ``` EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE "Could not determine thread index for stack guard region" #0 _platform_memmove #1.. nodejs_module.node bb::nodejs (AVM simulation path) ``` Linux is unaffected because its 8 MB default is comfortably large. The previous `AsyncOperation` path never hit this either: it ran on the libuv threadpool, whose threads are sized from `RLIMIT_STACK` (8 MB soft on macOS), not the 512 KB raw-thread default. ## Fix `std::thread` can't set a stack size, so launch the worker via `pthreads` with `pthread_attr_setstacksize` pinned to a generous `WORKER_STACK_SIZE` (32 MB — 4× the 8 MB that the libuv path proved sufficient, with headroom for deeper future call chains). Falls back to a default-stack `std::thread` only if pthreads is unavailable (`_WIN32`) or `pthread_create` fails. The shared_ptr lifetime model from AztecProtocol#21625 is preserved exactly — both the worker lambda and the `BlockingCall` completion callback still capture `self`, so this does not reintroduce the use-after-free. Only the thread-launch mechanism changed. ## Testing - The full bb build is too heavy to run in this session, so this is **not yet a local end-to-end repro/fix verification** — it relies on CI for compilation and on a macOS arm64 `aztec start --local-network` run to confirm the crash is gone. - The pthread/`std::function` trampoline was compiled and run standalone under `-std=c++20 -Wall -Wextra -Werror`: the worker thread receives a 32 MB stack (`pthread_get_stacksize_np` reports `33554432`), and the work runs and completes. - **Requested:** verify against tonight's nightly on macOS arm64 (M3) — the reporter's exact repro. ## Notes for reviewers - Targets `next` (not `merge-train/barretenberg`) to match AztecProtocol#21625's base and to make the nightly, since this is an urgent release-affecting crash. Happy to retarget if you'd prefer it go through the merge train. - 32 MB is a deliberate over-provision; if you'd rather mirror the libuv path precisely we could instead size from `getrlimit(RLIMIT_STACK)`. The fixed constant is simpler and the virtual reservation only commits pages as touched. - The longer-term fix is the NAPI→IPC migration (AztecProtocol#21331 / AztecProtocol#23196 / AztecProtocol#23238), which removes this in-process worker entirely. This is a targeted stop-gap for the shipping NAPI path. Related: AztecProtocol#21138 (introduced the threaded model), AztecProtocol#21625 (use-after-free fix), AztecProtocol#21629 (open alternative). --- *Created by [claudebox](https://claudebox.work/v2/sessions/4bd36dc505c20254) · group: `slackbot`*

ludamad added the ci-barretenberg Run all barretenberg/cpp checks. label Mar 16, 2026

ludamad force-pushed the backport-to-v4-staging branch from 9221e1a to be9b609 Compare March 17, 2026 18:16

ludamad requested review from dbanks12, fcarreiro, nventuro and sirasistant as code owners March 17, 2026 18:16

dbanks12 approved these changes Mar 17, 2026

View reviewed changes

ludamad force-pushed the fix/threaded-async-op-sigbus-v4 branch from bbdb7f7 to 91fab20 Compare March 17, 2026 18:57

ludamad requested review from a team, IlyasRidhuan, Maddiaa0, charlielye and jeanmon as code owners March 17, 2026 18:57

ludamad changed the base branch from backport-to-v4-staging to next March 17, 2026 18:57

ludamad removed the ci-barretenberg Run all barretenberg/cpp checks. label Mar 17, 2026

charlielye approved these changes Mar 17, 2026

View reviewed changes

dbanks12 marked this pull request as draft May 19, 2026 16:45

ludamad added the claudebox Owned by claudebox. it can push to this PR. label May 20, 2026

AztecBot mentioned this pull request May 21, 2026

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS #23469

Merged

AztecBot closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: move Release and delete out of BlockingCall callback to prevent SIGBUS on macOS#21629

fix: move Release and delete out of BlockingCall callback to prevent SIGBUS on macOS#21629
ludamad wants to merge 1 commit into
nextfrom
fix/threaded-async-op-sigbus-v4

ludamad commented Mar 16, 2026

Uh oh!

dbanks12 left a comment

Uh oh!

dbanks12 commented May 19, 2026

Uh oh!

AztecBot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ludamad commented Mar 16, 2026

Uh oh!

dbanks12 left a comment

Choose a reason for hiding this comment

Uh oh!

dbanks12 commented May 19, 2026

Uh oh!

AztecBot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants