Skip to content

fix: wire four dropped injection seams (F1–F4 from the latent-bug sweep)#263

Merged
theagenticguy merged 6 commits into
mainfrom
fix/wire-dropped-seams-f1-f4
Jun 29, 2026
Merged

fix: wire four dropped injection seams (F1–F4 from the latent-bug sweep)#263
theagenticguy merged 6 commits into
mainfrom
fix/wire-dropped-seams-f1-f4

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

Summary

Fixes the four latent bugs from the sweep — all the same class: an injection seam that defaults empty/no-op when unset, where the tests always inject a real value (so the suite stays green) but the production caller forgot to wire it, leaving the feature silently hollow with exit 0. Each fix ships with a regression test that exercises the production path (the blind spot the class hides in).

The four fixes (one commit each)

F4 — fix(ingestion): --allow-build-scripts is no longer a dead flag. The flag was parsed and passed to runAnalyze, then dropped: analyze.ts never spread it into the pipeline options, PipelineOptions had no such field, and scip-index.ts read only the CODEHUB_ALLOW_BUILD_SCRIPTS env var. Wired through all three layers; the COBOL proleap deep-parse + rust/java/kotlin build-script indexers are now reachable via the documented flag. Regression: the orchestrator "threads every PipelineOptions field" test now covers it.

F2 — fix(mcp): MCP scan no longer silently lints zero OpenAPI contracts. The MCP tool called createDefaultWrappers(specs) with no context, so Spectral's contractFiles defaulted empty. Lifted the file-discovery into a shared buildScannerFileContext in @opencodehub/scanners that both the CLI and MCP now call (killing the duplication + the drift). The CLI's two private discoverers are deleted.

F3 — fix(analysis): verdict ownership policies can pass on graph-owned paths. ownersByPath was hardcoded empty, so an ownership_required rule with empty require_approval_from (relying on graph owners) unconditionally blocked with "no owners". Added collectOwnersByPath (one batched OWNED_BY query, per-path, keyed by contributor email) and threaded it in — only when a policy is loaded, so the starter repo pays no graph walk.

F1 — fix(analysis): scan.sarif now carries graph signals. enrichWithProperties had zero production callers. Added buildScanEnrichment (maps each result to its File node's bus factor + fix-follow-feat density) and ran it in the scan pipeline after fingerprinting. Keyed by primaryLocationLineHash (run-structure-independent) and normalizing absolute result uris to the repo-relative File-node key — without that normalization the enrichment matched nothing, which would have been the same latent-bug class.

Scope note on F1

Only the file-granular signals that are a direct, cheap read off the File node are emitted (busFactor, temporalFixDensity). ownershipDrift/cochangeScore (community/temporal-table) and the symbol-level signals (blastRadius/community/centrality, which need per-finding runImpact/PageRank) are deliberately omitted rather than approximated — a follow-up can add them behind a per-finding budget. Every ResultEnrichment field is optional, so this is honest, not lossy.

Verification

  • Full workspace test suite: 0 failures. New regression tests: orchestrator field round-trip (F4), file-context.test.ts (F2), owners.test.ts (F3), scan-enrich.test.ts incl. the absolute-uri normalization (F1).
  • End-to-end on a real repo: --allow-build-scripts populates ast-chunks/commit/pins; MCP scan wires contract files; verdict ownership rule resolves graph owners; scan.sarif finding carries properties.opencodehub = {busFactor: 1} and is byte-identical across two scans.
  • Typecheck, repo-wide Biome, banned-strings all clean locally.

🤖 Generated with Claude Code

…ndex phase

`codehub analyze --allow-build-scripts proleap` parsed the flag and passed it
into runAnalyze, but analyze.ts never spread allowBuildScripts into the pipeline
options, the PipelineOptions type had no such field, and scip-index.ts read only
the CODEHUB_ALLOW_BUILD_SCRIPTS env var. So the documented invocation that "wakes
the JVM COBOL deep-parse bridge" silently ran the regex hot path only — exit 0,
no warning — and the rust/java/kotlin build-script indexers were unreachable
except via the undocumented env var.

Wire it through all three layers:
- add `allowBuildScripts?: readonly "proleap"[]` to PipelineOptions; stripPhaseKeys
  already forwards every PipelineOptions field to ctx.options.
- spread opts.allowBuildScripts into analyze.ts pipelineOptions.
- scip-index.ts reads ctx.options.allowBuildScripts (env var kept as fallback),
  appends the cobol-proleap indexer to candidates when proleap is opted-in on a
  COBOL project, and passes allowedBuildScripts to runIndexer (which still gates).

Regression guard: the orchestrator "threads every PipelineOptions field" test now
covers allowBuildScripts, so a future dropped field surfaces immediately. ingestion
638 + cli 346 tests green.
The MCP `scan` tool called createDefaultWrappers(specs) with no context arg, so
Spectral's contractFiles defaulted empty and it short-circuited to "no API
contract files to lint" — an agent scanning an OpenAPI repo over MCP got 0
findings even though specs existed on disk and the profile selected Spectral.
The CLI path wired it; the MCP path skipped exactly that step.

Lift the file-discovery (findOpenApiFiles, findDockerfiles) into a shared
`buildScannerFileContext` in @opencodehub/scanners (pure node:fs walks, no new
deps) and call it from BOTH surfaces, so they can never drift again:
- MCP scan.ts builds a DefaultWrapperContext (shared file discovery + checkov
  frameworks from the profile) and passes it to createDefaultWrappers.
- CLI scan.ts now composes on the shared discoverer; its two private
  findDockerfiles/findOpenApiFiles copies are deleted.

Regression guard: file-context.test.ts covers contract/Dockerfile discovery
(incl. node_modules/.git/.codehub exclusion) and that buildScannerFileContext
populates spectral/hadolint only for selected specs. scanners 94 + mcp 209 +
cli 346 tests green.
…can pass

The verdict CLI hardcoded `ownersByPath: new Map()`, so an `ownership_required`
policy rule with an empty `require_approval_from` (relying on graph-derived
owners) hit the evaluator's `acceptable.size === 0` branch and unconditionally
blocked with "path has no owners" — it could never pass in production, even
though OWNED_BY edges existed and the verdict already derives recommended
reviewers from them.

Add `collectOwnersByPath(graph, files)` to @opencodehub/analysis: one batched
OWNED_BY edge query over the changed-file node ids, grouped per path, keyed by
contributor email (falling back to email hash). The verdict CLI calls it over
verdict.changedFiles — only when a policy is actually loaded, so the starter
(no-policy) repo pays no graph walk — and threads the map into buildPolicyContext.
Defensive: a store missing the edge/node readers (minimal test fakes) yields an
empty map rather than throwing, preserving prior behavior.

Team/handle reconciliation (mapping an email to a GitHub team) stays a separate
design item; operators bridge it via `require_approval_from` as before.

Regression guard: owners.test.ts covers the per-path mapping, the no-owner-edge
omission, the emailHash fallback, and per-path isolation. analysis 159 + cli 346
tests green.
`enrichWithProperties` (stamps graph-derived signals under
`properties.opencodehub.*` on each scan result) had zero production callers —
only its own tests. So `scan.sarif` shipped with none of the documented graph
signals, even though the data sits in the indexed graph.

Add `buildScanEnrichment(graph, sarif, repoPath)` in @opencodehub/analysis: it
maps each result's primary-location file to its File node and reads the
file-granular signals already materialized there — bus factor and
fix-follow-feat density (mapped to temporalFixDensity). The scan CLI runs it
after `enrichWithFingerprints` (the enrichment is keyed by
`primaryLocationLineHash`) and before baseline/suppressions, opening the
scanned repo's graph read-only best-effort — a missing/empty graph leaves the
SARIF untouched, like the existing ingest step.

Two correctness points the e2e surfaced:
- result uris are normalized to the repo-relative form the File node id uses;
  scanners emit absolute uris, so without stripping the repoPath prefix the
  lookup silently matched nothing — itself the same latent-bug class. Covered
  by a dedicated test.
- keyed by fingerprint, not result index: scan merges each scanner into its
  own SARIF run and the enricher indexes per-run, so a global index misaligns.
- run-level stamp carries no clock/run id, so a re-scan stays byte-identical.

Scope: ownershipDrift/cochangeScore (community/temporal-table) and the
symbol-level signals (blastRadius/community/centrality, which need per-finding
runImpact/PageRank) are deliberately omitted rather than approximated — a
follow-up can add them behind a per-finding budget. All ResultEnrichment fields
are optional, so this is honest, not lossy.

Verified end-to-end: a scanned repo's finding now carries
result.properties.opencodehub = {busFactor: 1}, and two scans are
byte-identical. analysis 164 + cli 346 tests green; full workspace 0 failures.
CI runs `biome ci .`, which enforces the organize-imports/exports assist that
`biome check .` only auto-fixes locally. The new buildScanEnrichment /
buildScannerFileContext exports landed out of sorted order; reorder them. No
behavior change — export statements only.
Windows CI caught a real cross-platform defect: `findOpenApiFiles` /
`findDockerfiles` returned `path.relative` output with the OS separator, so on
Windows they yielded `api\swagger.json`. The graph keys File nodes by
`/`-joined repo-relative paths and the scanner wrappers expect the same, so the
backslash form would never match — Spectral/hadolint wiring (F2) and the scan
enrichment lookup (F1) would silently miss on Windows, the very class these
fixes close.

Normalize walked paths to `/` in file-context's walker, and harden
scan-enrich's `toRepoRelative` to fold backslashes before the prefix-strip so
the uri and repoPath compare consistently on Windows.

The existing file-context test (asserts `api/swagger.json`) is the regression
guard; it failed on windows-latest and passes after the fix. scanners 99 +
analysis 164 tests green; biome ci clean.
@theagenticguy theagenticguy merged commit dde590e into main Jun 29, 2026
38 checks passed
@theagenticguy theagenticguy deleted the fix/wire-dropped-seams-f1-f4 branch June 29, 2026 17:22
@github-actions github-actions Bot mentioned this pull request Jun 29, 2026
theagenticguy added a commit that referenced this pull request Jun 29, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.10.3</summary>

##
[0.10.3](root-v0.10.2...root-v0.10.3)
(2026-06-29)


### Features

* **frameworks:** wire stage-3 config-AST evidence into detection
([#264](#264))
([18e08b2](18e08b2))
* **pack:** context-bom read-receipt (9th BOM item) + real production
provenance
([#261](#261))
([b936af2](b936af2))


### Bug Fixes

* wire four dropped injection seams (F1–F4 from the latent-bug sweep)
([#263](#263))
([dde590e](dde590e))
</details>

<details><summary>cli: 0.10.3</summary>

##
[0.10.3](cli-v0.10.2...cli-v0.10.3)
(2026-06-29)


### Features

* **pack:** context-bom read-receipt (9th BOM item) + real production
provenance
([#261](#261))
([b936af2](b936af2))


### Bug Fixes

* wire four dropped injection seams (F1–F4 from the latent-bug sweep)
([#263](#263))
([dde590e](dde590e))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant