Skip to content

Fix DecodeStatusCache status file writes#424

Open
fallintoplace wants to merge 1 commit into
tensorflow:masterfrom
fallintoplace:fix-decode-status-cache
Open

Fix DecodeStatusCache status file writes#424
fallintoplace wants to merge 1 commit into
tensorflow:masterfrom
fallintoplace:fix-decode-status-cache

Conversation

@fallintoplace

Copy link
Copy Markdown

What changed

DecodeStatusCache now rewrites decoded_datasets.txt from in-memory state instead of opening the file in w+ and reading from a truncated handle. Status updates go through a temp file plus tf.io.gfile.rename(..., overwrite=True), and decoded dataset names are deduped while preserving first-seen order.

Cache hits are now read-only. TryLoadCache() returns the cached summaries without touching the status file, so looking up an existing dataset cannot corrupt the checkpoint key or append duplicates.

Tests

  • python3 -m py_compile lingvo/core/program_utils.py lingvo/core/program_utils_test.py
  • git diff --check -- lingvo/core/program_utils.py lingvo/core/program_utils_test.py lingvo/core/BUILD
  • Local DecodeStatusCache round-trip harness with a minimal lingvo.compat shim
  • USE_BAZEL_VERSION=5.3.0 npx -y @bazel/bazelisk test --experimental_repo_remote_exec //lingvo/core:program_utils_test does not reach analysis in this checkout because @rules_cc//cc:cc_library.bzl is not declared/resolved by the workspace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant