fix(core): use Promise.allSettled in TracingSDK flush/shutdown#3562
fix(core): use Promise.allSettled in TracingSDK flush/shutdown#3562ericallam wants to merge 3 commits into
Conversation
Switch TracingSDK.flush() and shutdown() from Promise.all to Promise.allSettled to prevent one provider's rejection from abandoning the other providers' in-flight exports. This fixes an issue where user-emitted trace data (logger.info calls, child spans) could be silently dropped on shutdown when any provider fails to flush, particularly affecting self-hosted deployments with processKeepAliveEnabled: false. Fixes #3556 Co-authored-by: Eric Allam <ericallam@users.noreply.github.com>
🦋 Changeset detectedLatest commit: d7eae64 The changes in this PR will be included in the next version bump. This PR includes changesets to release 29 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| public async flush() { | ||
| await Promise.all([ | ||
| const results = await Promise.allSettled([ | ||
| this._traceProvider.forceFlush(), | ||
| this._logProvider.forceFlush(), | ||
| this._meterProvider.forceFlush(), | ||
| ]); | ||
| const providerNames = ["trace", "log", "meter"] as const; | ||
| results.forEach((result, index) => { | ||
| if (result.status === "rejected") { | ||
| console.error(`Failed to flush ${providerNames[index]} provider:`, result.reason); | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
🚩 flush()/shutdown() no longer propagate errors to callers
With Promise.allSettled, flush() and shutdown() now always resolve (never reject), even when individual providers fail. This changes the public API contract. The callers in managed-run-worker.ts:725-738 and dev-run-worker.ts:724-737 (flushTracingSDK) wrap _tracingSDK?.flush() in Promise.race inside an outer Promise.allSettled (managed-run-worker.ts:673, dev-run-worker.ts:687), which tracks success/failure via successfulFlushes/failedFlushes. After this PR, flushTracingSDK will always resolve successfully, so "tracingSDK" will always appear in successfulFlushes even if underlying providers failed. The failure tracking at the flushAll level becomes less precise for tracing SDK failures specifically — though the individual provider errors are still logged to console by the new code. This is arguably the right tradeoff (partial success is better than total failure), but reviewers should be aware the caller-level failure detection for tracing is now effectively bypassed.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)
🧰 Additional context used📓 Path-based instructions (10)**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
{packages/core,apps/webapp}/**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.ts📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
Files:
packages/core/**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (packages/core/CLAUDE.md)
Files:
packages/**/*.{ts,tsx,js}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
{packages,integrations}/**/*.{ts,tsx,js}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
{package.json,**/*.{ts,tsx,js}}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{ts,tsx,js}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{ts,tsx,js,jsx,json,md,css,scss}📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (2)📚 Learning: 2026-03-22T13:26:12.060ZApplied to files:
📚 Learning: 2026-03-22T19:24:14.403ZApplied to files:
🔇 Additional comments (1)
WalkthroughTracingSDK.flush() and TracingSDK.shutdown() now call providers' forceFlush()/shutdown() using Promise.allSettled rather than Promise.all. Rejected provider results are collected (non-Error reasons normalized to Error) and, if any fail, an AggregateError containing all provider errors is thrown. A Changeset entry was added documenting the patch release for Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude can you address this review: With Promise.allSettled, flush() and shutdown() now always resolve (never reject), even when individual providers fail. This changes the public API contract. The callers in managed-run-worker.ts:725-738 and dev-run-worker.ts:724-737 (flushTracingSDK) wrap _tracingSDK?.flush() in Promise.race inside an outer Promise.allSettled (managed-run-worker.ts:673, dev-run-worker.ts:687), which tracks success/failure via successfulFlushes/failedFlushes. After this PR, flushTracingSDK will always resolve successfully, so "tracingSDK" will always appear in successfulFlushes even if underlying providers failed. The failure tracking at the flushAll level becomes less precise for tracing SDK failures specifically — though the individual provider errors are still logged to console by the new code. This is arguably the right tradeoff (partial success is better than total failure), but reviewers should be aware the caller-level failure detection for tracing is now effectively bypassed. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/core/src/v3/otel/tracingSDK.ts (1)
377-377: 💤 Low valueConsider extracting the duplicated
providerNamesarray.The
providerNamesarray is duplicated in bothflush()andshutdown()methods. Consider extracting it as a private static constant to follow the DRY principle.♻️ Proposed refactor
export class TracingSDK { private readonly _logProvider: LoggerProvider; private readonly _spanExporter: SpanExporter; private readonly _traceProvider: NodeTracerProvider; private readonly _meterProvider: MeterProvider; + private static readonly PROVIDER_NAMES = ["trace", "log", "meter"] as const; public readonly getLogger: LoggerProvider["getLogger"]; public readonly getTracer: TracerProvider["getTracer"]; ... public async flush() { const results = await Promise.allSettled([ this._traceProvider.forceFlush(), this._logProvider.forceFlush(), this._meterProvider.forceFlush(), ]); - const providerNames = ["trace", "log", "meter"] as const; results.forEach((result, index) => { if (result.status === "rejected") { - console.error(`Failed to flush ${providerNames[index]} provider:`, result.reason); + console.error(`Failed to flush ${TracingSDK.PROVIDER_NAMES[index]} provider:`, result.reason); } }); } public async shutdown() { const results = await Promise.allSettled([ this._traceProvider.shutdown(), this._logProvider.shutdown(), this._meterProvider.shutdown(), ]); - const providerNames = ["trace", "log", "meter"] as const; results.forEach((result, index) => { if (result.status === "rejected") { - console.error(`Failed to shutdown ${providerNames[index]} provider:`, result.reason); + console.error(`Failed to shutdown ${TracingSDK.PROVIDER_NAMES[index]} provider:`, result.reason); } }); }Also applies to: 391-391
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/v3/otel/tracingSDK.ts` at line 377, Extract the duplicated const array into a single private static constant and reference it from both methods: add a private static readonly PROVIDER_NAMES = ["trace","log","meter"] as const (inside the class in tracingSDK.ts) and replace the local providerNames declarations in the flush() and shutdown() methods with this PROVIDER_NAMES symbol so both methods share the same definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/core/src/v3/otel/tracingSDK.ts`:
- Line 377: Extract the duplicated const array into a single private static
constant and reference it from both methods: add a private static readonly
PROVIDER_NAMES = ["trace","log","meter"] as const (inside the class in
tracingSDK.ts) and replace the local providerNames declarations in the flush()
and shutdown() methods with this PROVIDER_NAMES symbol so both methods share the
same definition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5002825e-1096-418b-8acb-1b49c3603c36
📒 Files selected for processing (2)
.changeset/fix-tracing-sdk-flush.mdpackages/core/src/v3/otel/tracingSDK.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
packages/core/src/v3/otel/tracingSDK.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
packages/core/src/v3/otel/tracingSDK.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
packages/core/src/v3/otel/tracingSDK.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
packages/core/src/v3/otel/tracingSDK.ts
packages/core/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (packages/core/CLAUDE.md)
Never import the root package (
@trigger.dev/core). Always use subpath imports such as@trigger.dev/core/v3,@trigger.dev/core/v3/utils,@trigger.dev/core/logger, or@trigger.dev/core/schemas
Files:
packages/core/src/v3/otel/tracingSDK.ts
packages/**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pnpm run buildto verify changes in public packages (packages/*)
Files:
packages/core/src/v3/otel/tracingSDK.ts
{packages,integrations}/**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
When modifying any public package (
packages/*orintegrations/*), add a changeset usingpnpm run changeset:add
Files:
packages/core/src/v3/otel/tracingSDK.ts
{package.json,**/*.{ts,tsx,js}}
📄 CodeRabbit inference engine (CLAUDE.md)
Pin Zod to version 3.25.76 exactly across the entire monorepo - never use a different version or version range
Files:
packages/core/src/v3/otel/tracingSDK.ts
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js}: Import from@trigger.dev/coreusing subpaths only, never the root export
Always import tasks from@trigger.dev/sdk, never from@trigger.dev/sdk/v3or deprecatedclient.defineJob
Add crumbs to code using//@Crumbscomments or `// `#region` `@crumbsblocks for debug tracing during development
Files:
packages/core/src/v3/otel/tracingSDK.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}
📄 CodeRabbit inference engine (AGENTS.md)
Code formatting is enforced using Prettier. Run
pnpm run formatbefore committing
Files:
packages/core/src/v3/otel/tracingSDK.ts
🧠 Learnings (2)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
packages/core/src/v3/otel/tracingSDK.ts
🪛 LanguageTool
.changeset/fix-tracing-sdk-flush.md
[style] ~5-~5: Consider using a different verb for a more formal wording.
Context: ...ther providers' in-flight exports. This fixes an issue where user-emitted trace data ...
(FIX_RESOLVE)
🔇 Additional comments (2)
packages/core/src/v3/otel/tracingSDK.ts (1)
371-397: LGTM!.changeset/fix-tracing-sdk-flush.md (1)
1-5: LGTM!
|
Claude finished @ericallam's task in 1m 49s —— View job Addressing Review Feedback
Done. Updated
View commit | | |
…caller failure detection Keep Promise.allSettled for resilience (all providers get a chance to flush), but throw AggregateError if any failed so callers can still detect failures. Co-authored-by: Eric Allam <ericallam@users.noreply.github.com>
|
@claude improved this pr based on this review: Code review — PR #3562 (fix(core): use Promise.allSettled in TracingSDK flush/shutdown) Verdict ACCEPT-WITH-NITS — small, well-targeted fix. Rebase before merge. Important findings None. Nits
Strengths
|
|
Claude finished @ericallam's task in 1m 55s —— View job Addressing Review Feedback
Done. Pushed d7eae64 with the following changes:
Note on rebase: The branch has significant conflicts with main (many files across workflows, webapp components, run-engine, etc.). This should be rebased manually before merging to avoid reverting PR #3552 (SIGSEGV retry). |
…messages - Remove per-provider console.error calls (rely on AggregateError for error propagation) - Include failed provider names in AggregateError message for better debugging Co-authored-by: Eric Allam <ericallam@users.noreply.github.com>
Summary
TracingSDK.flush()andshutdown()fromPromise.alltoPromise.allSettledFixes #3556
Test plan
processKeepAliveEnabled: falselogger.info()calls and verify they appear in trace