Skip to content

BUG: Make H5SUPPORT_MUTEX_LOCK a real process-wide HDF5 lock#1643

Open
joeykleingers wants to merge 2 commits into
BlueQuartzSoftware:developfrom
joeykleingers:fix/hdf5-api-mutex
Open

BUG: Make H5SUPPORT_MUTEX_LOCK a real process-wide HDF5 lock#1643
joeykleingers wants to merge 2 commits into
BlueQuartzSoftware:developfrom
joeykleingers:fix/hdf5-api-mutex

Conversation

@joeykleingers

@joeykleingers joeykleingers commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

H5SUPPORT_MUTEX_LOCK() declared a fresh function-local std::mutex on every
expansion and immediately locked it, so it provided no mutual exclusion across
calls or threads (and expanded to nothing when H5Support_USE_MUTEX was
undefined). The four HDF5 helpers that use it (IsGroup, GetObjectPath,
GetDatasetType, StringForHDFType) were therefore effectively unsynchronized
against the non-thread-safe HDF5 C library.

  • Add a single process-wide ApiLock() (a leaky Meyers singleton) and point the
    macro at it.
  • Define H5Support_USE_MUTEX for the simplnx target
    (target_compile_definitions(simplnx PUBLIC ...)) so the macro is actually
    engaged rather than compiled out. PUBLIC also propagates the define to
    simplnx_test, so the regression guard runs live instead of taking the
    SUCCEED() no-op path.
  • Add a regression test that hammers the shared lock from many threads and
    asserts no lost updates (guards against reintroducing the per-call-local-mutex
    bug).

Scope / caveats

  • Coarse, process-wide lock. This serializes the four macro-guarded HDF5 leaf
    calls on one mutex. It trades some HDF5-call concurrency for correctness against
    the non-thread-safe vcpkg HDF5 build.
  • Limited coverage. The lock currently guards only those four helpers. The
    many other direct H5* calls throughout FileIO, DatasetIO, and
    Dream3dIO are not yet serialized by it, so this does not by itself close
    the broader HDF5 thread-safety story — it gives the rest of the code a single
    lock to adopt as it is hardened.

Test Plan

  • simplnx_test HDF5 suite passes with the lock enabled (H5 Utilities,
    DatasetIO, HDF5ImplicitCopy, the ApiLock regression guard, Read Legacy
    DREAM3D-NX Data, Contains HDF5 IO Support)

@joeykleingers joeykleingers added the bug Something isn't working label Jun 26, 2026

@imikejackson imikejackson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline notes below. The singleton design (out-of-line SIMPLNX_EXPORT ApiLock(), leaky Meyers init) is the right mechanism and well-justified. My main concern is that, as it lands on develop, runtime behavior is unchanged because the macro is gated behind a define that is never set in this build — details inline.

Comment thread src/simplnx/Utilities/Parsing/HDF5/H5Support.hpp
Comment thread src/simplnx/Utilities/Parsing/HDF5/H5Support.hpp Outdated
Comment thread test/H5Test.cpp
@imikejackson

imikejackson commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review checklist

Tracking items from the inline review above. The singleton design itself is sound — these are the gaps before merge.

Blocking

  • Resolve the inert-macro question. H5Support_USE_MUTEX is never defined anywhere in the simplnx tree, so H5SUPPORT_MUTEX_LOCK() still expands to nothing and runtime behavior is unchanged. Either:
    • define H5Support_USE_MUTEX for the simplnx target (e.g. target_compile_definitions), with a note on the perf impact of a process-wide HDF5 lock; or
    • document explicitly in the PR description that the macro is intentionally left disabled (and why), so it isn't read as a behavioral fix.
  • Fix the probeSingleDeflateEligibility doc reference in H5Support.hpp — no such function exists; remove it or replace with the correct name.

Non-blocking

  • Ensure the new H5Test.cpp regression guard runs in at least one CI build configuration (i.e. with H5Support_USE_MUTEX defined); otherwise it permanently takes the SUCCEED() no-op path and guards nothing.
  • Add a one-line note (PR description or comment) that this lock covers only the four macro-guarded helpers, not the many other direct H5* calls across FileIO/DatasetIO/Dream3dIO, so the broader HDF5 thread-safety story isn't implied to be closed.

🤖 Generated with Claude Code

H5SUPPORT_MUTEX_LOCK() declared a fresh function-local std::mutex on every
expansion and immediately locked it, so it provided no mutual exclusion across
calls or threads (and expanded to nothing when H5Support_USE_MUTEX was
undefined). The four HDF5 helpers that use it (IsGroup, GetObjectPath,
GetDatasetType, StringForHDFType) were therefore unsynchronized against the
non-thread-safe HDF5 C library.

Add a single process-wide ApiLock() and point the macro at it. ApiLock is a
plain std::mutex (an audit found no path re-acquires it while held, so it fails
fast on accidental future re-entry); its contract — lock only leaf H5 calls,
never hold across a helper that re-locks — is documented at the declaration. It
is intentionally leaked so a teardown HDF5 call during static destruction cannot
lock a destroyed mutex. The test drives the macro from many threads and asserts
no lost updates.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
* Define H5Support_USE_MUTEX PUBLIC on the simplnx target so the four
  macro-guarded HDF5 helpers actually serialize on the shared ApiLock()
  instead of the macro compiling to nothing. PUBLIC also propagates the
  define to simplnx_test, so the ApiLock regression guard runs live
  rather than taking the SUCCEED() no-op path.
* Remove the nonexistent probeSingleDeflateEligibility reference from the
  ApiLock() doc comment and add a SCOPE note that only those four helpers
  are guarded (other direct H5* calls are not yet serialized).

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants