fix: speed up splitLimitR long delimiter search by He-Pin · Pull Request #1063 · databricks/sjsonnet

He-Pin · 2026-06-28T10:16:40Z

Motivation

std.splitLimitR can spend too much time searching for long multi-character delimiters with repeated prefixes. No-match inputs such as a long string of a characters with delimiter aaaaaaaaab repeatedly re-check the same characters.

Modification

Keep the existing direct search path for common short delimiter cases.
Add a bounded direct-probe phase for multi-character delimiters.
Fall back to reverse KMP for worst-case long-delimiter searches.
Add edge-case coverage and a regression bench fixture for the long-delimiter no-match case.

Result

Case	go-jsonnet v0.22.0	jrsonnet 0.5.0-pre99	sjsonnet before	sjsonnet after
`std.length(std.splitLimitR(importstr "long-a.txt", "aaaaaaaaab", 1))`	`1`	`1`	`1`, slower repeated probing	`1`, reverse-KMP fallback
Long no-match delimiter search	correct result	correct result	worst-case approaches `O(N*M)` per lookup	worst-case `O(N+M)` per lookup

The PR keeps the observable split result unchanged and improves the pathological search path.

Risks

The new fallback adds algorithmic complexity to std.splitLimitR.
Unusual delimiter patterns should remain covered by the direct-probe path and regression fixture.

Motivation: std.splitLimitR scanned backward with repeated prefix comparisons, so long delimiters with shared prefixes could degrade to O(N*M) per lookup and show release-regression risk versus 0.6.3. Modification: Add a bounded direct-probe fast path plus reverse-KMP fallback for multi-character right splits, keep the zero-split fast path O(1), and add regression coverage/bench data. Result: New split_limit_r_long_delim JMH case improves from 12.636 ms/op on master and 5.087 ms/op on 0.6.3 to 1.522 ms/op. Guardrail reverse delimiter stays at parity with master.

Motivation: std.splitLimitR should avoid unnecessary reverse-KMP setup when a delimiter cannot fit, and the optimized path needs direct coverage for the KMP fallback. Modification: Return the original string before bounded/KMP setup when the delimiter is longer than the input. Add splitLimitR tests for the long-delimiter case, the unbounded maxsplits path, and a case that forces the KMP fallback after naive probes. Result: Preserves splitLimitR semantics while removing avoidable work and pinning the optimized fallback paths with tests.

He-Pin force-pushed the hepin/perf-split-limit-r-kmp branch from 96c032e to 695f808 Compare June 28, 2026 21:48

He-Pin marked this pull request as draft June 28, 2026 22:20

He-Pin marked this pull request as ready for review June 28, 2026 22:24

He-Pin marked this pull request as draft June 28, 2026 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: speed up splitLimitR long delimiter search#1063

fix: speed up splitLimitR long delimiter search#1063
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:hepin/perf-split-limit-r-kmp

He-Pin commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

He-Pin commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

Risks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Jun 28, 2026 •

edited

Loading