Skip to content

Fix recursive deepJoin and flattenDeepArray traversal hangs#1041

Draft
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:fix/deepjoin-cycle
Draft

Fix recursive deepJoin and flattenDeepArray traversal hangs#1041
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:fix/deepjoin-cycle

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Motivation

std.deepJoin and std.flattenDeepArray can hang on recursive arrays because the old traversal recursed without tracking the active array path. They should terminate with a normal builtin error instead of looping indefinitely.

Modification

  • Add a shared iterative deep-array traversal helper with explicit stack frames.
  • Track the active array path by identity so recursive arrays are reported deterministically.
  • Keep fast handling for flat arrays and direct backing-array access for materialized arrays.
  • Add regression tests for recursive and shared-array traversal cases.

Result

Case go-jsonnet v0.22.0 jrsonnet 0.5.0-pre99 sjsonnet before sjsonnet after
std.deepJoin(local a = [a]; a) max stack frames exceeded stack overflow / recursion failure hangs / exceeds timeout [std.deepJoin] max stack frames exceeded
std.flattenDeepArray(local a = [a]; a) max stack frames exceeded stack overflow / recursion failure hangs / exceeds timeout [std.flattenDeepArray] max stack frames exceeded

Flat and acyclic nested arrays still produce the same values as before.

Risks

  • The traversal code is shared by two builtins, so regressions could affect both std.deepJoin and std.flattenDeepArray.
  • Recursive arrays now fail earlier with a deterministic error instead of consuming time until the process is interrupted.

@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch 2 times, most recently from 951e137 to 972f73f Compare June 25, 2026 05:33
@He-Pin He-Pin marked this pull request as draft June 25, 2026 18:09
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch from 22cd3c5 to 1779389 Compare June 25, 2026 19:04
@He-Pin He-Pin marked this pull request as ready for review June 25, 2026 19:08
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch 2 times, most recently from bca8902 to 705ad78 Compare June 27, 2026 02:41
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Jun 28, 2026
Motivation:
PR databricks#1041 adds active-path tracking for deepJoin and flattenDeepArray to stop recursive array hangs. Flat root arrays do not need that guard, but the rebased implementation still allocated traversal state for them.

Modification:
Scan root arrays directly until the first nested array. Only allocate TraversalState and DeepArrayActiveSet after nested traversal is actually needed. Also stop catching VM-level fatal errors in this traversal path.

Result:
Keeps recursive-array regression tests passing while reducing flat flatten/deepJoin traversal overhead in debug-stats and hyperfine checks.
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch from 705ad78 to 3a0faab Compare June 28, 2026 11:18
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Jun 28, 2026
Motivation:
PR databricks#1041 added active-path tracking to make std.deepJoin terminate on recursive arrays. The nested acyclic workload std.makeArray(..., ["x"]) exposed a fixed cost from allocating traversal state and array frames for flat child arrays.

Modification:
Scan flat nested deepJoin arrays directly before entering traversal state. The root-level path now keeps scanning when child arrays contain only strings, and only allocates TraversalState after deeper array traversal is actually needed. Add a nested recursive-array regression test for that fast path.

Result:
Restores the nested deepJoin benchmark from a regression versus master to a small win while preserving recursive-array error behavior.
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Jun 28, 2026
Motivation:

Recursive arrays in std.deepJoin and std.flattenDeepArray could hang, and the optimized nested deepJoin path left an unused private helper that failed Scala 2.12 compilation.

Modification:

Add cross-platform active-array tracking for deep array traversals, route flattenDeepArray/deepJoin through the shared traversal helper, preserve flat fast paths, and remove the stale unused nested deepJoin helper.

Result:

Recursive deep array traversal now fails fast, shared acyclic arrays still evaluate, and the PR CI matrix compiles/tests locally except Graal, which the user explicitly skipped.

References:

PR databricks#1041
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch from d876e9f to 494b969 Compare June 28, 2026 16:05
@He-Pin He-Pin marked this pull request as draft June 28, 2026 16:20
@He-Pin He-Pin marked this pull request as ready for review June 28, 2026 17:13
@He-Pin He-Pin marked this pull request as draft June 28, 2026 17:14
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch from 494b969 to 4fe29c5 Compare June 28, 2026 17:15
@He-Pin He-Pin force-pushed the fix/deepjoin-cycle branch from 4fe29c5 to 4032178 Compare June 28, 2026 21:38
Motivation:

The deep-array traversal PR accidentally removed the inline-object lookup size guard, allowing large imported objects with super lookups to fall back to repeated linear scans.

Modification:

Restore Obj.InlineScanMax and the guarded inline scan so large inline objects use the lazy lookup map. Add nested flattenDeepArray recursion coverage.

Result:

Keeps recursive deep array detection covered while avoiding the unrelated large-object lookup performance regression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant