Skip to content

Aiur kernel: u8_mul gadget + multiplication/arithmetic cleanups#410

Open
arthurpaulino wants to merge 3 commits into
mainfrom
ap/kernel-opts
Open

Aiur kernel: u8_mul gadget + multiplication/arithmetic cleanups#410
arthurpaulino wants to merge 3 commits into
mainfrom
ap/kernel-opts

Conversation

@arthurpaulino
Copy link
Copy Markdown
Member

Optimization pass over the Aiur kernel. Adds a u8_mul Aiur gadget, routes
the kernel's primitive Nat multiplication through it, and cleans up two
small arithmetic redundancies.

Commits

perf(kernel): boolean-inversion opt + dedup u64_add

  • Check.lean::safe_refs_only: replace match u { 0 => 1, 1 => 0 } with
    1 - is_unsafe_ci(ci) — pure field arithmetic, drops the match's
    selectors and branch blocks.
  • ByteStream.u64_add now returns (U64, G), exposing the final
    carry-out, and the kernel's duplicate u64_add_with_carry is removed.
    klimbs_succ / klimbs_add_carry / klimbs_mul_single call u64_add
    directly. u64_add had no prior callers, so nothing else is affected.

feat(aiur): add u8_mul gadget

New u8_mul(a, b) Aiur primitive: byte × byte → (low, high) with
low + 256·high = a·b. It slots into the existing Bytes2 preprocessed
256×256 table as two extra columns plus a lookup channel, so its circuit
cost class matches u8_add (one lookup, two auxiliaries).

The op is wired through the Rust prover (bytecode, Bytes2 gadget,
constraints, trace, execute, FFI) and every Lean Aiur stage (Source
through Bytecode), the compiler, the semantics, and Meta syntax. The
FFI constructor tag 18 is u8Mul; subsequent Op tags shift by one.

Verified via the aiur and aiur-cross suites: prove/verify passes for
45·131 = (7, 23) and 255·255 = (1, 254).

perf(kernel): use u8_mul gadget in u64_mul, shrink divmod_256

u64_mul's byte schoolbook now splits each byte product via the u8_mul
gadget instead of a field multiply, so each column accumulator is a sum
of bytes (< ~4096) rather than a sum of products (< ~520k).
divmod_256 therefore only carry-propagates small values.

Effect on Nat.mul 1000000 1000000:

circuit before after
divmod_256 w55 h86 = 30396 w55 h8 = 1320
u64_mul w186 h1 = 186 w506 h1 = 506
total FFT 83,951,409 83,922,653

divmod_256 trace height drops 86 → 8. u64_mul widens (186 → 506) from
the u8_mul lookups — a fixed +320 — so the net is a win for any
non-trivial multiplication; only trivial single-byte products (e.g.
6·7) pay the width without an offsetting divmod_256 saving. Larger
operands win by more: the old divmod_256 height scaled with operand
size (~1000+ rows for 32-bit-sized factors) while the new one stays
bounded near 16.

Adds IxVMPrim.nat_mul_big (Nat.mul 1000000 1000000) as a multi-byte
mul check target.

Testing

  • lake test -- --ignored ixvm — passes (kernel arena, incl. natMul*,
    natPow*).
  • lake test -- aiur / aiur-cross — passes, including the new u8_mul
    prove/verify cases.
  • Rust (cargo build) and full Lean (lake build) build clean.

- Check.lean safe_refs_only: replace `match u { 0=>1, 1=>0 }` with
  `1 - is_unsafe_ci(ci)` — pure arithmetic, drops the match's selectors.
- ByteStream.u64_add now returns (U64, G), exposing the final carry-out;
  removes the kernel's duplicate u64_add_with_carry. klimbs_succ /
  klimbs_add_carry / klimbs_mul_single call u64_add directly. u64_add
  had no prior callers, so no ripple to Blake3 / IxonSerialize.
New `u8_mul(a, b)` Aiur primitive: byte * byte -> (low, high), with
low + 256*high = a*b. Slots into the existing Bytes2 preprocessed
256x256 table as two extra columns plus a lookup channel, so its cost
class matches u8_add (one lookup, two auxiliaries).

Wires the op through the Rust prover (bytecode, bytes2 gadget,
constraints, trace, execute, FFI) and every Lean Aiur stage (Source
through Bytecode), compiler, semantics, and Meta syntax. FFI ctor tag
18 is u8Mul; subsequent Op tags shift by one.

Verified via the aiur and aiur-cross suites: prove/verify passes for
45*131 = (7, 23) and 255*255 = (1, 254).
u64_mul's byte schoolbook now splits each byte product via the u8_mul
gadget instead of a field mul, so column accumulators are sums of
bytes (< ~4096) rather than sums of products (< ~520k). divmod_256
therefore carry-propagates only small values: for Nat.mul 1000000
1000000 its trace height drops 86 -> 8 (FFT -29k). u64_mul itself
widens (186 -> 506) from the u8_mul lookups -- a fixed +320 -- so the
net is a win for any non-trivial multiplication.

Adds IxVMPrim.nat_mul_big as a multi-byte mul check target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant