Aiur kernel: `u8_mul` gadget + multiplication/arithmetic cleanups by arthurpaulino · Pull Request #410 · argumentcomputer/ix

arthurpaulino · 2026-05-15T19:38:15Z

Optimization pass over the Aiur kernel. Adds a u8_mul Aiur gadget, routes
the kernel's primitive Nat multiplication through it, and cleans up two
small arithmetic redundancies.

Commits

`perf(kernel): boolean-inversion opt + dedup u64_add`

Check.lean::safe_refs_only: replace match u { 0 => 1, 1 => 0 } with
1 - is_unsafe_ci(ci) — pure field arithmetic, drops the match's
selectors and branch blocks.
ByteStream.u64_add now returns (U64, G), exposing the final
carry-out, and the kernel's duplicate u64_add_with_carry is removed.
klimbs_succ / klimbs_add_carry / klimbs_mul_single call u64_add
directly. u64_add had no prior callers, so nothing else is affected.

`feat(aiur): add u8_mul gadget`

New u8_mul(a, b) Aiur primitive: byte × byte → (low, high) with
low + 256·high = a·b. It slots into the existing Bytes2 preprocessed
256×256 table as two extra columns plus a lookup channel, so its circuit
cost class matches u8_add (one lookup, two auxiliaries).

The op is wired through the Rust prover (bytecode, Bytes2 gadget,
constraints, trace, execute, FFI) and every Lean Aiur stage (Source
through Bytecode), the compiler, the semantics, and Meta syntax. The
FFI constructor tag 18 is u8Mul; subsequent Op tags shift by one.

Verified via the aiur and aiur-cross suites: prove/verify passes for
45·131 = (7, 23) and 255·255 = (1, 254).

`perf(kernel): use u8_mul gadget in u64_mul, shrink divmod_256`

u64_mul's byte schoolbook now splits each byte product via the u8_mul
gadget instead of a field multiply, so each column accumulator is a sum
of bytes (< ~4096) rather than a sum of products (< ~520k).
divmod_256 therefore only carry-propagates small values.

Effect on Nat.mul 1000000 1000000:

circuit	before	after
`divmod_256`	w55 h86 = 30396	w55 h8 = 1320
`u64_mul`	w186 h1 = 186	w506 h1 = 506
total FFT	83,951,409	83,922,653

divmod_256 trace height drops 86 → 8. u64_mul widens (186 → 506) from
the u8_mul lookups — a fixed +320 — so the net is a win for any
non-trivial multiplication; only trivial single-byte products (e.g.
6·7) pay the width without an offsetting divmod_256 saving. Larger
operands win by more: the old divmod_256 height scaled with operand
size (~1000+ rows for 32-bit-sized factors) while the new one stays
bounded near 16.

Adds IxVMPrim.nat_mul_big (Nat.mul 1000000 1000000) as a multi-byte
mul check target.

Testing

lake test -- --ignored ixvm — passes (kernel arena, incl. natMul*,
natPow*).
lake test -- aiur / aiur-cross — passes, including the new u8_mul
prove/verify cases.
Rust (cargo build) and full Lean (lake build) build clean.

- Check.lean safe_refs_only: replace `match u { 0=>1, 1=>0 }` with `1 - is_unsafe_ci(ci)` — pure arithmetic, drops the match's selectors. - ByteStream.u64_add now returns (U64, G), exposing the final carry-out; removes the kernel's duplicate u64_add_with_carry. klimbs_succ / klimbs_add_carry / klimbs_mul_single call u64_add directly. u64_add had no prior callers, so no ripple to Blake3 / IxonSerialize.

New `u8_mul(a, b)` Aiur primitive: byte * byte -> (low, high), with low + 256*high = a*b. Slots into the existing Bytes2 preprocessed 256x256 table as two extra columns plus a lookup channel, so its cost class matches u8_add (one lookup, two auxiliaries). Wires the op through the Rust prover (bytecode, bytes2 gadget, constraints, trace, execute, FFI) and every Lean Aiur stage (Source through Bytecode), compiler, semantics, and Meta syntax. FFI ctor tag 18 is u8Mul; subsequent Op tags shift by one. Verified via the aiur and aiur-cross suites: prove/verify passes for 45*131 = (7, 23) and 255*255 = (1, 254).

u64_mul's byte schoolbook now splits each byte product via the u8_mul gadget instead of a field mul, so column accumulators are sums of bytes (< ~4096) rather than sums of products (< ~520k). divmod_256 therefore carry-propagates only small values: for Nat.mul 1000000 1000000 its trace height drops 86 -> 8 (FFT -29k). u64_mul itself widens (186 -> 506) from the u8_mul lookups -- a fixed +320 -- so the net is a win for any non-trivial multiplication. Adds IxVMPrim.nat_mul_big as a multi-byte mul check target.

arthurpaulino added 3 commits May 15, 2026 08:52

arthurpaulino requested a review from gabriel-barrett May 15, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aiur kernel: `u8_mul` gadget + multiplication/arithmetic cleanups#410

Aiur kernel: `u8_mul` gadget + multiplication/arithmetic cleanups#410
arthurpaulino wants to merge 3 commits into
mainfrom
ap/kernel-opts

arthurpaulino commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arthurpaulino commented May 15, 2026

Commits

perf(kernel): boolean-inversion opt + dedup u64_add

feat(aiur): add u8_mul gadget

perf(kernel): use u8_mul gadget in u64_mul, shrink divmod_256

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`perf(kernel): boolean-inversion opt + dedup u64_add`

`feat(aiur): add u8_mul gadget`

`perf(kernel): use u8_mul gadget in u64_mul, shrink divmod_256`