Skip to content

feat(ipc): concurrent service machinery — async reactor + codegen#24198

Open
charlielye wants to merge 1 commit into
nextfrom
cl/ipc-wsdb-concurrent-server
Open

feat(ipc): concurrent service machinery — async reactor + codegen#24198
charlielye wants to merge 1 commit into
nextfrom
cl/ipc-wsdb-concurrent-server

Conversation

@charlielye

@charlielye charlielye commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What

General-purpose machinery in ipc-runtime and ipc-codegen for building IPC services that handle requests concurrently, plus its first consumer (the wsdb world-state service).

Until now an ipc-runtime server processed requests through a single-threaded run() loop (accept → wait_for_data → receive → handler → send → release) — one in-flight request at a time. A service backed by this could only dispatch as fast as one thread, regardless of how concurrent its clients were. This adds an asynchronous, non-blocking reactor that any generated service can opt into, so a service can service its connections concurrently while ipc-runtime itself stays thread-pool-agnostic.

Nothing here is wsdb-specific except the per-fork scheduler called out under Consumers. The reactor, the async codegen, the wake mechanism, and the response reordering are service-agnostic: aztec-wsdb, aztec-vm-sim, and cdb all generate against the same async server interface. wsdb is simply the first service to exercise it under parallel load, so the benchmarks below are wsdb's.

The machinery (ipc-runtime + ipc-codegen)

  • run_reactor(AsyncHandler): the reactor owns all ring/socket I/O and never blocks on a handler. It copies each request out, release()s the slot immediately, assigns a per-connection sequence, and invokes an AsyncHandlervoid(int client_id, std::span<const uint8_t> request, Respond respond), where Respond = std::function<void(std::vector<uint8_t>)>. The handler may respond inline (on the reactor thread) or hand the work to a thread pool and respond later from a worker. ipc-runtime owns no pool and spawns no thread — concurrency is purely the handler's choice. Serial callers (bb) keep the untouched run().
  • Async server codegen, all four languages: the generated server dispatch hands each handler a responder rather than expecting a return value, so a handler can defer completion. C++ Responder<Resp> (ok()/error()); Rust Responder<R> over a Box<dyn FnOnce(Vec<u8>) + Send>; Zig Responder(RespType) (ok/err); TS handlers are plain async functions returning a Promise. The wire format is unchanged — the cross-language echo matrix and frozen golden fixtures still pass — so this is purely a server-handler shape change for any service that regenerates.
  • Sole-sender + per-connection reorder: only the reactor calls send(), so each response ring stays single-producer/lock-free. A small per-connection reorder stash (keyed by sequence) preserves FIFO on the wire even when handlers complete out of order — no request-id envelope, no wire/client change. The only lock is over the in-process stash, never a ring.
  • notify() wake (no timeout poll): lets a worker that finished off the reactor thread wake the reactor immediately. Sockets use a self-pipe registered in the epoll/kqueue set; MPSC-SHM bumps the doorbell seq then futex_wakes (mirroring publish, so the wake isn't lost against the consumer's armed futex), and the consumer evaluates the completion predicate inside the same seq-latched window. The SHM spin also breaks on a doorbell-seq change so low-concurrency completions aren't spun through.
  • Inline fast path: when nothing is in flight and no further request is already pending, the handler runs on the reactor thread instead of dispatching — the thread handoff is pure latency when there's no concurrency to exploit. inflight == 0 short-circuits the pending check, so a burst never pays for it and stays on the dispatch path. The check is has_pending_request(): sockets poll via wait_for_data(0); MPSC-SHM uses a side-effect-free MpscConsumer::has_data() so the peek doesn't disturb its round-robin / spin state.

Consumers

  • wsdb (aztec-wsdb) — the first consumer, and the only part of this PR that is service-specific. A per-fork WsdbScheduler sits on top of the reactor: each handler routes its work through schedule_read / schedule_write, and per fork, committed reads run concurrently (they read the committed snapshot, independent of in-flight writes), uncommitted reads wait behind an in-flight write on that fork, and writes are exclusive per fork. This moves read/write ordering out of the client (the old WorldStateOpsQueue, removed in refactor: cut TS world state and NAPI AVM over to WSDB IPC; delete NAPI WSDB #23036) and into the database, so multiple clients stay consistent. Wiring: a dispatch bb::ThreadPool distinct from WorldState's intra-op pool (mutating handlers wait() on the latter; sharing could deadlock), sized from the caller's threads budget (not hardware_concurrency(), which ignores cgroup limits and would exhaust the per-UID thread limit in containers), and max_shm_clients raised to 8 to cover the AVM pool.
  • avm / cdbaztec-vm-sim and cdb generate against the same async server interface. The avm server runs its handler inline on the reactor thread (one in-flight simulation per connection, so no dispatch pool), which is exactly the inline fast path above.

Results

These are the wsdb service under parallel reads — the consumer that drives this work. reads/s, same machine, single-worker jest (meaningful run-to-run variance on a shared host; medians shown).

Single connection, pipelined (the TS world-state client pattern)

parallel_read.bench.test.ts — N in-flight reads down one connection, vs the in-process next baseline:

concurrency in-process (next) reactor — UDS reactor — SHM
1 8719 7277 7395
2 17011 13567 15687
4 29567 18944 23203
8 38309 25106 29234
16 36879 28676 30390

With the old serial run() the IPC path was flat at ~15k reads/s regardless of concurrency. The reactor now scales; the inline fast path holds c=1–2 at ~in-process parity (no low-concurrency regression). One pipelined connection plateaus around ~30k because all N in-flight reads contend on a single request/response ring and the single reactor funnel.

N independent connections, sequential (the AVM simulator-pool topology)

multi_connection_read.bench.test.ts — one wsdb, N client connections, each a synchronous read loop (one in-flight). This is what the aztec-vm-sim pool actually does. Caveat: not a like-for-like in-process comparison — this bench calls AsyncApi directly, skipping the facade that the in-process numbers (and parallel_read.bench) go through. That facade costs ~40% (the 1-connection number here, ~10.6k SHM, vs parallel_read SHM c=1, ~7.4k), so these figures measure server capability with a lean client, not an in-process win:

connections reactor — UDS reactor — SHM
1 ~8.1k ~10.6k
2 ~13.8k ~18.3k
4 ~20.9k ~24.5k
7 ~30.4k ~40-47k

Two takeaways: (1) spreading load across N rings scales better than pipelining one connection (N=7 SHM raw ~46k vs one pipelined connection ~30k), so the server itself is not the bottleneck; (2) deflated by the ~1.4x facade factor, SHM lands at ~32k facade-equivalent — ~80-86% of in-process, never above — consistent with the like-for-like single-connection table. The IPC path does not beat in-process; the residual gap is the facade + cross-process transport. (7 connections is the production SHM ceiling: max_shm_clients=8 minus the world-state client's slot. UDS is uncapped.)

Testing

  • New gtests exercise the reactor over both transports: SocketTest.ReactorPipelinedConcurrencyAndOrder and ShmTest.MpscReactorPipelinedConcurrencyAndOrder — pipelined requests with reversed completion latencies must come back FIFO and complete concurrently (a lost wake shows as a stall). Full ipc-runtime suite green.
  • ipc-codegen cross-language echo matrix (UDS + SHM) and frozen golden fixtures green — the async handler change is wire-compatible.
  • native_world_state.test.ts (50 tests) green over the reactor, including Concurrent requests — mutating and non-mutating requests are correctly queued (validates wsdb's per-fork read/write ordering and that mutating handlers run on the dispatch pool without deadlocking on WorldState's intra-op pool).
  • Benches: parallel_read.bench.test.ts runs on both next (in-process) and this branch (identical file); multi_connection_read.bench.test.ts is IPC-only (skips in-process).

@charlielye charlielye changed the base branch from cl/ipc-3-avm-wsdb-cutover to next June 20, 2026 21:01
@charlielye charlielye force-pushed the cl/ipc-wsdb-concurrent-server branch from f407002 to 317a72f Compare June 20, 2026 21:02
@charlielye charlielye changed the title feat(ipc): concurrent wsdb server via reactor + executor feat(ipc): concurrent wsdb server — async handlers + per-fork ordering Jun 20, 2026
Asynchronous server-handler codegen across C++/Rust/Zig (TS already async):
the generated dispatch hands each handler a respond callback, so a handler may
run inline or defer to a thread pool and respond when ready. The wire protocol
is unchanged. ipc-runtime gains run_reactor (a non-blocking reactor that owns
all ring I/O, is the sole sender, and reorders responses per connection) plus
notify()/wait_for_data_or_ready for completion wakeups, and the run() serial
loop is unchanged.

The wsdb C++ server adopts this: each of the 40 handlers declares its own
ordering via schedule_read / schedule_write (reads concurrent, committed reads
unordered, writes exclusive per fork), and WsdbScheduler implements the
read-batch / write-barrier model with an inline fast path when idle and a
dispatch pool distinct from WorldState's intra-op pool.

This is the IPC-machinery layer that the wsdb cutover builds on; it is
independent of the world-state IPC consumer. Includes the single-connection
parallel-read benchmark (transport-agnostic).
@charlielye charlielye force-pushed the cl/ipc-wsdb-concurrent-server branch from 317a72f to 317eaf3 Compare June 21, 2026 09:20
@charlielye charlielye changed the title feat(ipc): concurrent wsdb server — async handlers + per-fork ordering feat(ipc): concurrent service machinery — async reactor + codegen Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant