Skip to content

Commit 4fb0110

Browse files
matzclaude
andcommitted
fix(codegen): empty {} arg widens to body-driven hash variant (#397)
Fixes #397. `def store(receiver); receiver["k"] = "v"; end; store({})` failed C-compile: `{}` defaulted to `str_int_hash`, the param widened from the caller to that, and the body's String store emitted `sp_StrIntHash_set(..., (const char *))` -- type mismatch into the int slot. Three coordinated changes: 1. `narrow_param_hash_types_from_body_writes` — new iterative-loop pass that scans method bodies for `lv_<param>[k] = v` writes, unifies the observed key/value types via compose_hash_type, and widens the param's hash variant when the observation is more specific than the current default. 2. `empty_hash_coerce` in `compile_call_args_with_defaults` — when a positional arg is an empty `{}` literal AND the receiving param is a hash variant, emit the matching `sp_<HashType>_new()` instead of falling through to `compile_hash_literal`'s default `sp_StrIntHash_new()`. 3. `detect_poly_params` carve-out — empty `{}` literal is compatible with any concrete hash variant, mirroring the existing empty `[]` carve-out. Without it, the body-widened `ct` and the call-site `at` (still `str_int_hash`) disagree and the param falls back to poly. The body-driven inference covers str_str / str_poly / sym_int / sym_str / sym_poly / int_str / poly_poly variants symmetric to the existing str_int default. compose_hash_type also flips the right `@needs_<hash>_hash` flag so the runtime helpers / typedefs get emitted for the widened variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1d6d5d6 commit 4fb0110

3 files changed

Lines changed: 365 additions & 1 deletion

File tree

spinel_codegen.rb

Lines changed: 315 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12242,6 +12242,286 @@ def infer_param_array_type_from_body
1224212242
# the common Object/Enumerable surface (`length`, `each`, `to_s`,
1224312243
# ...) — when these match a user class they don't actually
1224412244
# discriminate from a built-in.
12245+
# Issue #397. A method whose param starts as `str_int_hash` (the
12246+
# `{}` default) but whose body writes a non-int value to it
12247+
# (`receiver["k"] = "v"`) ended up emitting `sp_StrIntHash_set(...,
12248+
# const char *)` — a `mrb_int` slot getting a string, fails C-compile.
12249+
# Fix: scan the body for `lv_<param>[k] = v` writes, infer the right
12250+
# hash variant from the observed key/value types, and widen the
12251+
# param's hash type. Compile_call_args_with_defaults then coerces
12252+
# the caller's empty `{}` literal to match the widened param type.
12253+
def narrow_param_hash_types_from_body_writes
12254+
# Top-level methods.
12255+
mi = 0
12256+
while mi < @meth_names.length
12257+
bid_h = @meth_body_ids[mi]
12258+
if bid_h >= 0
12259+
pnames = @meth_param_names[mi].split(",")
12260+
ptypes = @meth_param_types[mi].split(",")
12261+
changed = 0
12262+
pk = 0
12263+
while pk < pnames.length
12264+
if pk < ptypes.length && is_hash_type(ptypes[pk]) == 1
12265+
new_t = infer_param_hash_from_writes(bid_h, pnames[pk], ptypes[pk])
12266+
if new_t != "" && new_t != ptypes[pk]
12267+
ptypes[pk] = new_t
12268+
changed = 1
12269+
end
12270+
end
12271+
pk = pk + 1
12272+
end
12273+
if changed == 1
12274+
@meth_param_types[mi] = ptypes.join(",")
12275+
end
12276+
end
12277+
mi = mi + 1
12278+
end
12279+
# Class instance methods.
12280+
ci = 0
12281+
while ci < @cls_names.length
12282+
all_params = @cls_meth_params[ci].split("|")
12283+
all_ptypes = @cls_meth_ptypes[ci].split("|")
12284+
bodies = @cls_meth_bodies[ci].split(";")
12285+
cls_changed = 0
12286+
mj = 0
12287+
while mj < all_params.length
12288+
bid_c = -1
12289+
if mj < bodies.length
12290+
bid_c = bodies[mj].to_i
12291+
end
12292+
if bid_c >= 0
12293+
cm_pnames = all_params[mj].split(",")
12294+
cm_ptypes = "".split(",")
12295+
if mj < all_ptypes.length
12296+
cm_ptypes = all_ptypes[mj].split(",")
12297+
end
12298+
m_changed = 0
12299+
pk2 = 0
12300+
while pk2 < cm_pnames.length
12301+
if pk2 < cm_ptypes.length && is_hash_type(cm_ptypes[pk2]) == 1
12302+
new_t = infer_param_hash_from_writes(bid_c, cm_pnames[pk2], cm_ptypes[pk2])
12303+
if new_t != "" && new_t != cm_ptypes[pk2]
12304+
cm_ptypes[pk2] = new_t
12305+
m_changed = 1
12306+
end
12307+
end
12308+
pk2 = pk2 + 1
12309+
end
12310+
if m_changed == 1
12311+
all_ptypes[mj] = cm_ptypes.join(",")
12312+
cls_changed = 1
12313+
end
12314+
end
12315+
mj = mj + 1
12316+
end
12317+
if cls_changed == 1
12318+
@cls_meth_ptypes[ci] = all_ptypes.join("|")
12319+
end
12320+
ci = ci + 1
12321+
end
12322+
end
12323+
12324+
# Walk the method body looking for `lv_<pname>[k] = v` writes
12325+
# (CallNode `[]=` with recv = LocalVariableReadNode named pname).
12326+
# Returns a more-specific hash type than `cur` based on the
12327+
# observed key + value types, or "" if there's no widening needed
12328+
# (or if the observed types are inconsistent).
12329+
def infer_param_hash_from_writes(nid, pname, cur)
12330+
if nid < 0
12331+
return ""
12332+
end
12333+
# Track all observed value types via a single string accumulator.
12334+
val_types = "".split(",")
12335+
key_types = "".split(",")
12336+
collect_param_hash_writes(nid, pname, val_types, key_types)
12337+
if val_types.length == 0
12338+
return ""
12339+
end
12340+
# Decide hash variant from the union of observed types. If any
12341+
# observed value type doesn't fit `cur`'s value slot, widen.
12342+
cur_kt = hash_key_part(cur)
12343+
cur_vt = hash_value_part(cur)
12344+
new_vt = unify_hash_value_types(val_types)
12345+
new_kt = unify_hash_key_types(key_types, cur_kt)
12346+
return compose_hash_type(new_kt, new_vt)
12347+
end
12348+
12349+
def collect_param_hash_writes(nid, pname, val_types, key_types)
12350+
if nid < 0
12351+
return
12352+
end
12353+
if @nd_type[nid] == "CallNode" && @nd_name[nid] == "[]="
12354+
r = @nd_receiver[nid]
12355+
if r >= 0 && @nd_type[r] == "LocalVariableReadNode" && @nd_name[r] == pname
12356+
args_id_h = @nd_arguments[nid]
12357+
if args_id_h >= 0
12358+
args = get_args(args_id_h)
12359+
if args.length >= 2
12360+
kt = infer_type(args[0])
12361+
vt = infer_type(args[args.length - 1])
12362+
if not_in(kt, key_types) == 1
12363+
key_types.push(kt)
12364+
end
12365+
if not_in(vt, val_types) == 1
12366+
val_types.push(vt)
12367+
end
12368+
end
12369+
end
12370+
end
12371+
end
12372+
cs = []
12373+
push_child_ids(nid, cs)
12374+
k = 0
12375+
while k < cs.length
12376+
collect_param_hash_writes(cs[k], pname, val_types, key_types)
12377+
k = k + 1
12378+
end
12379+
end
12380+
12381+
def hash_key_part(t)
12382+
if t == "str_int_hash" || t == "str_str_hash" || t == "str_poly_hash"
12383+
return "str"
12384+
end
12385+
if t == "sym_int_hash" || t == "sym_str_hash" || t == "sym_poly_hash"
12386+
return "sym"
12387+
end
12388+
if t == "int_str_hash"
12389+
return "int"
12390+
end
12391+
if t == "poly_poly_hash"
12392+
return "poly"
12393+
end
12394+
"str"
12395+
end
12396+
12397+
def hash_value_part(t)
12398+
if t == "str_int_hash" || t == "sym_int_hash"
12399+
return "int"
12400+
end
12401+
if t == "str_str_hash" || t == "sym_str_hash" || t == "int_str_hash"
12402+
return "str"
12403+
end
12404+
if t == "str_poly_hash" || t == "sym_poly_hash" || t == "poly_poly_hash"
12405+
return "poly"
12406+
end
12407+
"int"
12408+
end
12409+
12410+
def unify_hash_value_types(vts)
12411+
if vts.length == 0
12412+
return "int"
12413+
end
12414+
if vts.length == 1
12415+
v = vts[0]
12416+
return "int" if v == "int" || v == "bool" || v == "nil"
12417+
return "str" if v == "string"
12418+
return "poly"
12419+
end
12420+
# Multiple distinct types -> poly.
12421+
"poly"
12422+
end
12423+
12424+
def unify_hash_key_types(kts, cur_kt)
12425+
if kts.length == 0
12426+
return cur_kt
12427+
end
12428+
if kts.length == 1
12429+
k = kts[0]
12430+
return "str" if k == "string"
12431+
return "sym" if k == "symbol"
12432+
return cur_kt
12433+
end
12434+
"poly"
12435+
end
12436+
12437+
# Issue #397 helper: when positional arg `nid` is an empty `{}` and
12438+
# the receiving param's expected type `pt` is a hash variant, emit
12439+
# the matching `sp_<HashType>_new()`. Returns "" if the arg isn't an
12440+
# empty hash literal or the param isn't a hash type (caller falls
12441+
# back to the regular compile_expr path).
12442+
def empty_hash_coerce(nid, pt)
12443+
if nid < 0
12444+
return ""
12445+
end
12446+
if @nd_type[nid] != "HashNode"
12447+
return ""
12448+
end
12449+
elems = parse_id_list(@nd_elements[nid])
12450+
if elems.length != 0
12451+
return ""
12452+
end
12453+
if is_hash_type(pt) == 0
12454+
return ""
12455+
end
12456+
@needs_gc = 1
12457+
if pt == "str_int_hash"
12458+
@needs_str_int_hash = 1
12459+
return "sp_StrIntHash_new()"
12460+
end
12461+
if pt == "str_str_hash"
12462+
@needs_str_str_hash = 1
12463+
return "sp_StrStrHash_new()"
12464+
end
12465+
if pt == "str_poly_hash"
12466+
@needs_str_poly_hash = 1
12467+
return "sp_StrPolyHash_new()"
12468+
end
12469+
if pt == "sym_int_hash"
12470+
@needs_sym_int_hash = 1
12471+
return "sp_SymIntHash_new()"
12472+
end
12473+
if pt == "sym_str_hash"
12474+
@needs_sym_str_hash = 1
12475+
return "sp_SymStrHash_new()"
12476+
end
12477+
if pt == "sym_poly_hash"
12478+
@needs_sym_poly_hash = 1
12479+
return "sp_SymPolyHash_new()"
12480+
end
12481+
if pt == "int_str_hash"
12482+
@needs_int_str_hash = 1
12483+
return "sp_IntStrHash_new()"
12484+
end
12485+
if pt == "poly_poly_hash"
12486+
@needs_poly_poly_hash = 1
12487+
return "sp_PolyPolyHash_new()"
12488+
end
12489+
""
12490+
end
12491+
12492+
def compose_hash_type(kt, vt)
12493+
if kt == "str" && vt == "int"
12494+
@needs_str_int_hash = 1
12495+
return "str_int_hash"
12496+
end
12497+
if kt == "str" && vt == "str"
12498+
@needs_str_str_hash = 1
12499+
return "str_str_hash"
12500+
end
12501+
if kt == "str" && vt == "poly"
12502+
@needs_str_poly_hash = 1
12503+
return "str_poly_hash"
12504+
end
12505+
if kt == "sym" && vt == "int"
12506+
@needs_sym_int_hash = 1
12507+
return "sym_int_hash"
12508+
end
12509+
if kt == "sym" && vt == "str"
12510+
@needs_sym_str_hash = 1
12511+
return "sym_str_hash"
12512+
end
12513+
if kt == "sym" && vt == "poly"
12514+
@needs_sym_poly_hash = 1
12515+
return "sym_poly_hash"
12516+
end
12517+
if kt == "int" && vt == "str"
12518+
@needs_int_str_hash = 1
12519+
return "int_str_hash"
12520+
end
12521+
@needs_poly_poly_hash = 1
12522+
return "poly_poly_hash"
12523+
end
12524+
1224512525
def narrow_param_types_from_body_method_calls
1224612526
# Top-level methods.
1224712527
mi = 0
@@ -15213,6 +15493,19 @@ def detect_poly_in_node(nid)
1521315493
next
1521415494
end
1521515495
end
15496+
# Issue #397: empty `{}` literal is compatible with
15497+
# any concrete hash variant the body widened the
15498+
# param to. Without this, the call site's `at`
15499+
# (`str_int_hash`, the empty-hash default) and the
15500+
# body-widened `ct` (e.g. `str_str_hash`) disagree,
15501+
# the param falls back to poly, and the call site
15502+
# ends up boxing the hash through poly dispatch.
15503+
if is_empty_hash_literal(arg_ids[k]) == 1
15504+
if is_hash_type(ct) == 1
15505+
k = k + 1
15506+
next
15507+
end
15508+
end
1521615509
if ct != at
1521715510
if ct != "poly"
1521815511
# Only mark as poly if both types are meaningful
@@ -15853,6 +16146,7 @@ def compile
1585316146
infer_ivar_types_from_writers
1585416147
infer_param_array_type_from_body
1585516148
narrow_param_types_from_body_method_calls
16149+
narrow_param_hash_types_from_body_writes
1585616150
detect_poly_params
1585716151
cur_sig = inference_signature
1585816152
if cur_sig == prev_sig
@@ -30191,7 +30485,18 @@ def compile_call_args_splat(nid, mi, pnames, ptypes, defaults, kw_names, kw_vals
3019130485
if pt == "poly"
3019230486
result = result + box_expr_to_poly(positional_ids[k])
3019330487
else
30194-
result = result + compile_expr(positional_ids[k])
30488+
# Issue #397: empty `{}` literal as arg gets coerced to the
30489+
# param's expected hash variant (which may have been widened
30490+
# by narrow_param_hash_types_from_body_writes from the
30491+
# caller-side default `str_int_hash`). Without this, the
30492+
# caller would emit `sp_StrIntHash_new()` and the callee's
30493+
# signature `sp_StrStrHash *` would type-mismatch.
30494+
coerced = empty_hash_coerce(positional_ids[k], pt)
30495+
if coerced != ""
30496+
result = result + coerced
30497+
else
30498+
result = result + compile_expr(positional_ids[k])
30499+
end
3019530500
end
3019630501
k = k + 1
3019730502
next
@@ -30467,6 +30772,15 @@ def compile_call_args_with_defaults(nid, mi, omit_trailing = 0)
3046730772
next
3046830773
end
3046930774
end
30775+
# Issue #397: empty `{}` literal -> coerce to the param's
30776+
# widened hash variant (e.g. body promoted str_int_hash
30777+
# to str_str_hash via narrow_param_hash_types_from_body_writes).
30778+
coerced_h = empty_hash_coerce(positional_ids[k], ptypes[k])
30779+
if coerced_h != ""
30780+
result = result + coerced_h
30781+
k = k + 1
30782+
next
30783+
end
3047030784
# When the param expects `sp_<C> *` but the arg's static
3047130785
# type is `int` (typically a local catching a method
3047230786
# whose return widened to mrb_int because the union

test/empty_hash_arg_param_widen.rb

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Issue #397. `def store(receiver); receiver["k"] = "v"; end; store({})`
2+
# previously emitted `sp_StrIntHash_set(receiver, "k", "v")` -- a const
3+
# char* into the int-valued slot of `sp_StrIntHash`, fails C-compile.
4+
# `{}` literal defaulted to str_int_hash, the param widened to that
5+
# from the caller, and the body's String store didn't fit.
6+
#
7+
# Fix: narrow_param_hash_types_from_body_writes scans the body for
8+
# `lv_<param>[k] = v` writes and widens the param's hash variant from
9+
# the observed value type. Then compile_call_args_with_defaults's new
10+
# empty_hash_coerce arm rewrites the caller's empty `{}` to emit the
11+
# matching `sp_<HashType>_new()`. detect_poly_params also gets an
12+
# empty-hash carve-out so it doesn't fold the body-widened concrete
13+
# type back to poly.
14+
15+
def store_str(receiver)
16+
receiver["k"] = "v"
17+
receiver["k2"] = "v2"
18+
receiver
19+
end
20+
21+
h = store_str({})
22+
puts h["k"] # v
23+
puts h["k2"] # v2
24+
25+
def store_sym_int(h)
26+
h[:a] = 1
27+
h[:b] = 2
28+
h
29+
end
30+
31+
s = store_sym_int({})
32+
puts s[:a].to_s # 1
33+
puts s[:b].to_s # 2
34+
35+
# Mixed values widens to str_poly_hash.
36+
def store_mixed(r)
37+
r["i"] = 7
38+
r["s"] = "x"
39+
r
40+
end
41+
42+
m = store_mixed({})
43+
puts m["i"].to_s # 7
44+
puts m["s"] # x

0 commit comments

Comments
 (0)