fix: handle ujson surrogate parse exceptions in JSON imports#1046
Closed
He-Pin wants to merge 1 commit into
Closed
fix: handle ujson surrogate parse exceptions in JSON imports#1046He-Pin wants to merge 1 commit into
He-Pin wants to merge 1 commit into
Conversation
22fe76f to
7ca9536
Compare
He-Pin
added a commit
to He-Pin/sjsonnet
that referenced
this pull request
Jun 25, 2026
Motivation: JSON imports and std.parseJson relied on ujson behavior for Unicode surrogate escapes. ujson can throw plain Exception for some malformed escapes and silently drop a lone high surrogate, so catching an exception message only fixed one symptom. Modification: Add a JsonUnicodeValidator that scans JSON string contents for invalid \uXXXX surrogate sequences before invoking ujson. Reuse it for .json import fast path and std.parseJson while preserving import fallback to normal Jsonnet parsing. Add regression coverage for lone high/low surrogates, reversed pairs, valid pairs, and escaped backslash literals. Result: Malformed surrogate escapes no longer surface as Internal Error or corrupt imported strings. Valid pairs and literal \u text continue to parse correctly across JVM and JS targeted tests. References: databricks#1046
7ca9536 to
85788f3
Compare
He-Pin
added a commit
to He-Pin/sjsonnet
that referenced
this pull request
Jun 25, 2026
Motivation: Some malformed Unicode surrogate escapes in imported .json files make ujson throw a plain java.lang.Exception, which previously escaped as an Internal Error. Avoid adding a pre-parse scan on the normal import path. Modification: Extend the .json import fast-path fallback to recognize ujson's known surrogate parse exception messages and fall back to normal Jsonnet parsing. Keep validation delegated to ujson and do not change std.parseJson. Result: Malformed surrogate escapes that ujson reports no longer leak Internal Error stack traces. Valid surrogate pairs still parse on the JSON fast path. Cases ujson silently accepts or corrupts remain a parser limitation rather than being fixed with an extra scan. References: databricks#1046
85788f3 to
15a8173
Compare
He-Pin
commented
Jun 25, 2026
1433a90 to
e8016b8
Compare
e8016b8 to
b557485
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Importing malformed Unicode surrogate escapes from
.jsonfiles can make ujson throw a plainjava.lang.Exception. That previously escaped as an sjsonnet internal error instead of falling back to normal Jsonnet parsing and reporting a parse error.Modification
java.lang.Exceptionsurrogate parse failures in the.jsonimport fast path.Result
\uDC00as a lone low surrogateTruncated unicode surrogate pair escape sequence in string literalsyntax error: invalid string escapeInternal Errorcaused byUn-paired low surrogatesjsonnet.ParseErrorwithoutInternal Error\uDE00\uD83Das a reversed surrogate pairsyntax error: invalid string escapeInternal Errorcaused byUn-paired low surrogatesjsonnet.ParseErrorwithoutInternal ErrorValid surrogate pairs still use the JSON import fast path.
Risks