Skip to content

chore(cli): harden deploy against stale binaries and transient cloud fetch failures #163

@khaliqgant

Description

@khaliqgant

Context

Khaliq hit a blocking deploy failure while deploying pr-reviewer from /Users/khaliqgant/Projects/AgentWorkforce/agents/review/persona.ts:

workforce deploy -> /Users/khaliqgant/Projects/AgentWorkforce/agents/review/persona.ts
persona pr-reviewer: 2 integration(s), 0 schedule(s)
workspace: 50587328-441d-4acb-b8f3-dbe1b3c5de99
! cloud integration catalog fetch failed (fetch failed); falling back to provider-name-only matching for this deploy
! failed to check connection status for github: fetch failed
x integrations.github: failed while checking connection status

A same-workspace redeploy by another agent succeeded minutes later, which rules out cloud API health, workspace identity, auth, and stage mismatch as the primary cause. The original fetch failed was likely a transient local DNS/TLS/connection failure, but the CLI surfaced it as a hard blocker without retrying.

During triage we also found local version skew: Khaliq's globally resolved agentworkforce came from another checkout and treated persona.ts as JSON, while the local workforce checkout CLI has newer TypeScript persona handling. Noninteractive deploy also requires --mode, which is easy to miss when moving from interactive shell behavior to automation.

Proposed work

  • Add retry/backoff for transient Node fetch failed errors around cloud integration catalog fetches and connection-status checks.
  • Add an integration test or mocked cloud test that exercises agentworkforce deploy against catalog + connection-status endpoints and verifies transient fetch retry behavior.
  • Add CLI version/source detection or a clear warning when a globally resolved agentworkforce binary is stale relative to the local workspace/persona expectations.
  • Improve noninteractive UX around --mode: print a direct actionable command suggestion instead of only --mode is required.

Acceptance criteria

  • A one-time transient fetch failure on catalog/status endpoints does not fail deploy if retry succeeds.
  • CI covers the deploy cloud-integration preflight path so this does not regress silently.
  • Users get a clear stale-binary/update hint when their global CLI cannot load modern persona.ts files.
  • Noninteractive deploy failures include a copyable cloud-mode retry example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions