Skip to content

Improve the runtime performance of the bsonjs.dumps API#106

Open
maozguttman wants to merge 1 commit into
mongodb-labs:mainfrom
maozguttman:improve_dumps_api_runtime_performance
Open

Improve the runtime performance of the bsonjs.dumps API#106
maozguttman wants to merge 1 commit into
mongodb-labs:mainfrom
maozguttman:improve_dumps_api_runtime_performance

Conversation

@maozguttman

@maozguttman maozguttman commented Jun 30, 2026

Copy link
Copy Markdown

Overview

Improve the runtime performance of the bsonjs.dumps API.

Below are the results from benchmark.py, executed before and after my changes. bsonjs was compiled with GCC 14.2.0 on SLES 15 and run on Python 3.13.2.

Before my changes:

Timing: bsonjs.dumps(b)
10000 loops, best of 3: 0.05099046230316162
Timing: json_util.dumps(bson.decode(b))
10000 loops, best of 3: 0.38455578312277794
bsonjs is 7.54x faster than json_util

Timing: bsonjs.loads(j)
10000 loops, best of 3: 0.05889510316774249
Timing: bson.encode(json_util.loads(j))
10000 loops, best of 3: 0.41942713782191277
bsonjs is 7.12x faster than json_util

After my changes:

Timing: bsonjs.dumps(b)
10000 loops, best of 3: 0.024287620093673468
Timing: json_util.dumps(bson.decode(b))
10000 loops, best of 3: 0.39230786077678204
bsonjs is 16.15x faster than json_util

Timing: bsonjs.loads(j)
10000 loops, best of 3: 0.057051923125982285
Timing: bson.encode(json_util.loads(j))
10000 loops, best of 3: 0.41939733177423477
bsonjs is 7.35x faster than json_util

My internal tool processes BSON files, some of which are extremely large and are probably not representative of the BSON data that most users work with.
As a result, the performance improvements described below may be most relevant to workloads with similar characteristics.
For example, I have a BSON file with a size of 791,038,613 bytes, containing:

  • 7,876,079 dictionaries
  • 18,837 lists
  • 31,568,552 primitives

It is important to note that these BSON files rarely contain escaped or Unicode characters.

I used Copilot to identify potential areas for runtime optimization. Some of the suggested improvements provided measurable gains in this use case.
I would appreciate it if you could review my changes and, if appropriate, provide feedback, commit them, and include them in a future release.

The overall runtime of my test case was reduced from 19.50 seconds to 4.91 seconds (a 74.8% reduction).
The table below shows the runtime reduction contributed by each change:

Index Change Runtime Improvement (seconds)
1 bson_utf8_escape_for_json optimization 9.02
2 Nested string copy elimination in BSON visitor 3.86
3 bson_string_append_printf optimization 0.47
4 bson_utf8_validate optimization 0.54
5 Eliminate Redundant strlen Calls 0.67

Testing

I ran the Python test suite and added a unit test for the "bson_utf8_escape_for_json optimization".
In addition, to validate the "Nested string copy elimination in BSON visitor" changes, I locally enabled the "BSON max len limited" configuration (this change was used only for testing and is not included in the commit). I tested all possible values using sample data containing BSON objects and obtained identical results both with and without my changes.
I also tested the complete set of changes on several BSON files and obtained identical results with and without the optimizations.

Code changes

1. bson_utf8_escape_for_json optimization

Problem:
The bson_utf8_escape_for_json function is called for every string value and key. It always creates a new bson_string_t and processes escape and Unicode characters even when the string does not contain any.
This leads to unnecessary memory allocations and copying.

Fix:
Update the function to:

  • Return the original string as is if it contains no escape or Unicode characters (calling function should not free it)
  • Allocate and return a modified string only when needed (calling function should free it)

Files changed:

  • bsonjs/bson/bson-string.c: bson_string_new_n, bson_string_new
  • bsonjs/bson/bson-string.h: bson_string_new_n
  • bsonjs/bson/bson-utf8.c
  • bsonjs/bson/bson.c: _bson_as_json_visit_utf8, _bson_as_json_visit_regex, _bson_as_json_visit_dbpointer, _bson_as_json_visit_before, _bson_as_json_visit_code, _bson_as_json_visit_symbol, _bson_as_json_visit_codewscope
  • test/test_bsonjs.py: test_dumps_escaped_and_unicode_characters

2. Nested string copy elimination in BSON visitor

Problem:
For each nested document or array:

  • A separate bson_string_t is allocated
  • The entire child JSON is built in that buffer
  • The result is copied into the parent buffer
  • The child buffer is then freed
    This causes excessive allocations and copying.

Fix:
Use a single shared bson_string_t buffer across all nesting levels.

Files changed:

  • bsonjs/bson/bson.c: _state_str_len, bson_json_state_t, _bson_as_json_visit_after, _bson_as_json_visit_codewscope, _bson_as_json_visit_document, _bson_as_json_visit_array, _bson_as_json_visit_all

3. bson_string_append_printf optimization

Problem:
Each call to bson_string_append_printf:

  • Allocates a temporary buffer
  • Uses vsnprintf into it
  • Appends to the main string
  • Frees the temporary buffer
    This results in frequent heap allocations.

Fix:

  • Use a stack-allocated buffer for vsnprintf
  • Append directly when the buffer is sufficient
  • Fall back to heap allocation only when necessary

Files changed:

  • bsonjs/bson/bson-string.c: bson_string_append_printf, bson_strdupv_printf, bson_strdup_printf
  • bsonjs/bson/bson-string.h: bson_strdupv_printf

4. bson_utf8_validate optimization

Problem:

  • An inefficient loop was used to check whether a null character exists before the end of a string.

Fix:

  • Replaced the loop with memchr for more efficient detection.

Files changed:

  • bsonjs/bson/bson-utf8.c: bson_utf8_validate

5. Eliminate Redundant strlen Calls

Problem:

  • Many redundant calls to strlen are made even though the string length is already known.

Fix:

  • Reuse previously computed string lengths instead of recalculating them with strlen.

Files changed:

  • bsonjs/bson/bson-iso8601.c: _bson_iso8601_date_format
  • bsonjs/bson/bson-iter.c: bson_iter_visit_all
  • bsonjs/bson/bson-string.c: bson_string_append, bson_string_append_c, bson_string_append_unichar, bson_string_append_printf, bson_strdupv_printf, bson_strdup_printf
  • bsonjs/bson/bson-string.h: STR_AND_LEN, bson_string_append, bson_string_append_printf
  • bsonjs/bson/bson-utf8.c: bson_utf8_escape_for_json
  • bsonjs/bson/bson-utf8.h: bson_utf8_escape_for_json
  • bsonjs/bson/bson.c: _bson_as_json_visit_utf8, _bson_as_json_visit_decimal128, _bson_as_json_visit_double, _bson_as_json_visit_undefined, _bson_as_json_visit_null, _bson_as_json_visit_oid, _bson_as_json_visit_binary, _bson_as_json_visit_bool, _bson_as_json_visit_date_time, _bson_as_json_visit_regex, _bson_as_json_visit_timestamp, _bson_as_json_visit_dbpointer, _bson_as_json_visit_minkey, _bson_as_json_visit_maxkey, _bson_as_json_visit_before, _bson_as_json_visit_code, _bson_as_json_visit_symbol, _bson_as_json_visit_codewscope, _bson_as_json_visit_document, _bson_as_json_visit_array, _bson_as_json_visit_all, _bson_iter_validate_before

@maozguttman maozguttman requested a review from a team as a code owner June 30, 2026 11:31
@maozguttman maozguttman requested review from sleepyStick and removed request for a team June 30, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant