Fix resource.enc race when functions share a bundle#6750
Open
vlechemin wants to merge 1 commit intoanomalyco:devfrom
Open
Fix resource.enc race when functions share a bundle#6750vlechemin wants to merge 1 commit intoanomalyco:devfrom
vlechemin wants to merge 1 commit intoanomalyco:devfrom
Conversation
When multiple `aws.Function` resources share a pre-built bundle
directory (via `bundle:`), Pulumi runs their `Runtime.Build` RPCs
concurrently. Each RPC encrypts the function's links and writes them
to `{bundle}/resource.enc` — so the writes race. The last writer wins,
but partial / interleaved writes can also leave a truncated or mixed
ciphertext that then fails AES-GCM authentication on the Lambda's
first cold start (surfaces as a `Decipheriv` error during SDK init).
Namespace the file by `FunctionID` when a bundle is shared
(`resource-{FunctionID}.enc`) and point the Lambda at its own file
via `SST_KEY_FILE`. The default (non-bundle) path keeps writing
`resource.enc` into each function's own artifact directory, so
existing deployments are unaffected.
Covered by new tests in `pkg/runtime/runtime_test.go`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When multiple
aws.Functionresources share a pre-built bundle directory via thebundle:option, Pulumi runs theirRuntime.BuildRPCs concurrently. Each RPC encrypts the function's links and writes them to{bundle}/resource.enc, so the writes race.The last writer wins for the happy case, but partial / interleaved writes can leave a truncated or mixed ciphertext on disk — we see consistent trailing byte corruption after the 16-byte AES-GCM auth tag. The Lambda then fails to authenticate the ciphertext on its first cold start and the SDK throws at init:
Root cause
pkg/runtime/runtime.goCollection.Build:With
input.Bundle != "",result.Out == input.Bundleand the filename is constant, so N concurrent builds all race on the same path.In the non-bundle path, each function gets its own
.sst/artifacts/{FunctionID}-src/directory, so there's no collision.Fix
Namespace the file by
FunctionIDwhen a shared bundle is in play (resource-{FunctionID}.enc) and point each Lambda at its own file viaSST_KEY_FILE:pkg/runtime/runtime.go— writeresource-{FunctionID}.encwheninput.Bundle != "", keepresource.encotherwiseplatform/src/components/aws/function.ts— setSST_KEY_FILEto the per-function filename whenargs.bundleis setThe default (non-bundle) path is unchanged, so existing deployments keep the old filename.
Minor downside: each function's uploaded zip contains every
resource-{FunctionID}.encin the shared bundle dir (Lambda only reads its own viaSST_KEY_FILE). This is a few extra KB per function and avoids more invasive changes to the bundle-packaging code path. Happy to explore trimming if reviewers prefer.Test coverage
Adds
TestCollectionBuildEncryptedResourceFileWithBundletopkg/runtime/runtime_test.go:Bundle != ""and checksresource-{FunctionID}.encis written while the legacyresource.encis not.Real-world repro
This is hitting us in production (grasp-gg). We share a single pre-built bundle across ~40 Lambdas (Turbo-cached handlers, one-shot esbuild,
bundle:+ relativehandler:). After every deploy, a subset of Lambdas cold-start withDecipheriverrors until they're re-invoked or re-deployed. Byte-level inspection of the deployed zip consistently shows 6 garbage bytes after the expected ciphertext+tag — the telltale sign of an interrupted partial write landing on top of a complete one.Happy to iterate on naming / scope / packaging if maintainers want a different shape.