diff --git a/CHANGELOG.md b/CHANGELOG.md
index 278ce96f..54dff942 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` — stateless orchestrator for LLM-agent discoverability** (`diff_diff/agent_workflow.py`). Prints (and returns as dict) a copy-pasteable 5-step workflow with the caller's column names templated in: `profile_panel` → `get_llm_guide("autonomous")` → `<Estimator>(...).fit(df, ...)` → `practitioner_next_steps(result)` → `BusinessReport(result).full_report()`. The function calls nothing internally and does not inspect `df`; it is a guided tour, not a router. Surfaces the canonical workflow primitives (`profile_panel`, `get_llm_guide`, `practitioner_next_steps`, `BusinessReport`) that cold-start agent dry-passes at [igerber/causal-llm-eval](https://github.com/igerber/causal-llm-eval) showed agents practically never reach for on their own. Output structure: `{"profile_call", "guide_call", "fit_candidates", "validation_calls", "reporting_call", "script"}`; `fit_candidates` is a flat list of estimator/diagnostic class names referenced in the workflow patterns (each must remain importable on `diff_diff`, locked by `tests/test_agent_workflow.py::test_fit_candidates_all_importable`). Closes [issue #460](https://github.com/igerber/diff-diff/issues/460).
+- **Top-level `__doc__` rewritten to lead with the agent workflow** (`diff_diff/__init__.py`). `help(diff_diff)` now opens with the `agent_workflow(df, ...)` recommendation as the first non-blank paragraph; `get_llm_guide("full")` and `get_llm_guide("practitioner")` pointers preserved for the existing `tests/test_guides.py::test_module_docstring_mentions_helper` guard.
+- **`dir(diff_diff)` now surfaces agent-facing entrypoints first** via a module-level `__dir__()` override paired with a small `_OrderedName(str)` subclass that subverts CPython's unconditional alphabetic sort (PyList_Sort respects `__lt__` on the elements). Agent-facing names (`agent_workflow`, `profile_panel`, `get_llm_guide`, `practitioner_next_steps`, `BusinessReport`, `DiagnosticReport`) appear at the head of the list; the remainder stays alphabetic via the `str.__lt__` fallback. The underlying `__all__` membership is **unchanged** and `from diff_diff import *` semantics are unaffected (driven by `__all__`, not `dir()`). Elements are `isinstance(x, str)` and compatible with `inspect.getmembers`, dict-key lookup, f-strings, and standard `str` methods; tooling that re-sorts via `sorted(dir(diff_diff))` will see priority order (use `sorted(dir(diff_diff), key=str)` to recover plain alphabetic if needed). Internal: `_AGENT_FACING_ORDER` tuple is read by the new `tests/test_agent_discoverability.py` contract test (PR B). Addresses [issue #460](https://github.com/igerber/diff-diff/issues/460) item 3.
 - **`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` now supported** (`diff_diff/estimators.py:1657`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for the post-period-average ATT (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`) was not implemented — only the per-coefficient case existed in `_compute_cr2_bm`. New `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra; `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the existing per-coefficient parity to clubSandwich's `coef_test$df_Satt` is preserved (refactor regression at atol=1e-10). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on `effective_cluster_ids`: one-way `_compute_bm_dof_from_contrasts` when None, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask. R parity verified at atol=1e-10 against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` on the new `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Per-coefficient `period_effects[t].p_value` / `conf_int` and `avg_att` `avg_p_value` / `avg_conf_int` now reflect the correct Satterthwaite DOF rather than the n-k fallback under cluster+hc2_bm. Weighted CR2-BM (`survey_design=` paths) remains a separate gate. New tests: `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF` (4 tests: refactor regression, R-parity, shape validation, cluster-count validation); existing `test_multi_period_cluster_plus_hc2_bm_rejected` flipped to behavioral `test_multi_period_cluster_plus_hc2_bm_produces_finite_inference`.
 - **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `<time>_<X>` dummies for that entry because the design already absorbs the time dimension via the non-reference period dummies; without the skip, the two blocks would collide on dummy names and the `coefficients` dict would silently collapse duplicates under `var_names`-keyed construction, breaking the coefficients-vs-vcov alignment that downstream consumers rely on. This applies to both the new `absorb=` auto-route and the pre-existing `fixed_effects=[<time_col>]` invocation.
 - **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied.
diff --git a/TODO.md b/TODO.md
index 9aa28973..05b756c9 100644
--- a/TODO.md
+++ b/TODO.md
@@ -162,6 +162,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | Add CI validation for `docs/doc-deps.yaml` integrity (stale paths, unmapped source files) | `docs/doc-deps.yaml` | #269 | Low |
 | SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |
 | AI review CI: pin workflow contract via test (uses `openai/codex-action@v1`, passes `prompt-file`, reads `steps.run_codex.outputs.final-message`, preserves diff-exclude paths and comment markers). Currently only the wrapper-tag and closing-tag-escape strings are asserted. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #416 | Low |
+| `__dir__()` discoverability contract test (head order, membership, `_OrderedName` invariants, `inspect.getmembers` parity) — deferred from PR #464 to the planned PR B addressing #461. The full snapshot/contract surface lands together in `tests/test_agent_discoverability.py`. | `diff_diff/__init__.py::__dir__`, `tests/test_agent_discoverability.py` (new in PR B) | #464 | Low |
 | `TestWorkflowDoesNotExecutePRHeadCode` (CodeQL #14 dismissal guard) does not model: `bash <script>` / `sh <script>` / `./<script>` / `source <script>` / `. <script>` direct shell-script execution; multi-line `python3 -c` bodies (line-by-line shlex can't reassemble across newlines — the workflow's 5 sanitizer bodies are exempt by invisibility); shell-variable-expansion indirection (`SCRIPT="$X"; python3 "$SCRIPT"`); `eval`; `find -exec`; `xargs -I {}`. Each represents a path by which PR-head bytes COULD execute without the test failing. The guard catches accidental regressions of common forms (16 tests covering pip/npm/cargo/maturin/etc. installs, python file exec, bash -c indirection with compound flags, env-var prefixes, line continuations, subshells/brace groups, single-line python -c, write-overwrites of allowlisted /tmp paths). Closing the residuals would require multi-line shell parsing with command-substitution awareness + script-execution allowlists — significant work for diminishing return given the dismissal's primary defense is the documented threat model on the alert and in `.github/workflows/ai_pr_review.yml` comment block. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #436 | Low |
 | Render `docs/methodology/REPORTING.md` and `docs/methodology/REGISTRY.md` as in-site Sphinx pages so cross-references can use `:doc:` instead of off-site GitHub `blob/main` URLs. Current state (#410 fix-audit-r2) restores navigable links via `blob/main`, but stable-docs readers can land on a different revision than the package version they are reading. Two viable paths: (a) add `myst-parser` to `docs/conf.py` extensions + docs extras and link with `:doc:`, or (b) convert both files to `.rst`. | `docs/conf.py`, `docs/api/business_report.rst`, `docs/api/diagnostic_report.rst`, `docs/tutorials/18_geo_experiments.ipynb`, `docs/tutorials/19_dcdh_marketing_pulse.ipynb` | follow-up | Low |
 
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
index a0f2a38a..213c23a9 100644
--- a/diff_diff/__init__.py
+++ b/diff_diff/__init__.py
@@ -1,23 +1,23 @@
-"""
-diff-diff: A library for Difference-in-Differences analysis.
-
-This library provides sklearn-like estimators for causal inference
-using the difference-in-differences methodology.
+"""diff-diff: Difference-in-Differences causal inference with sklearn-like API.
+Recommended starting call for LLM agents:
+``diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)``
+prints a copy-pasteable workflow with your column names wired in.
 
-For AI agents:
+The orchestrator names the full sequence:
 
-    1. Describe your data:    ``diff_diff.profile_panel(df, unit=..., time=...,
-                              treatment=..., outcome=...)``
-    2. Consult the reference: ``diff_diff.get_llm_guide("autonomous")``
+    1. Describe the panel:    diff_diff.profile_panel(df, ...)
+    2. Choose an estimator:   diff_diff.get_llm_guide("autonomous")
                               (estimator-support matrix + reasoning)
-    3. Follow the workflow:   ``diff_diff.get_llm_guide("practitioner")``
-                              (Baker et al. (2025) 8-step recipe)
-    4. Report results:        ``diff_diff.BusinessReport(results)``
-                              (structured agent-legible output)
+    3. Fit:                   <Estimator>(...).fit(df, ...)
+    4. Validate:              diff_diff.practitioner_next_steps(result)
+    5. Report:                diff_diff.BusinessReport(result)
+
+For a comprehensive API reference call ``diff_diff.get_llm_guide("full")``.
+For the Baker et al. (2025) 8-step practitioner recipe call
+``diff_diff.get_llm_guide("practitioner")``.
 
-For a comprehensive API reference call ``diff_diff.get_llm_guide("full")``;
-``practitioner_next_steps(results)`` returns context-aware guidance after
-any estimator's ``fit()``.
+This library provides sklearn-like estimators for causal inference using
+the difference-in-differences methodology.
 """
 
 # Import backend detection from dedicated module (avoids circular imports)
@@ -256,6 +256,7 @@
     DiagnosticReportResults,
 )
 from diff_diff._guides_api import get_llm_guide
+from diff_diff.agent_workflow import agent_workflow
 from diff_diff.profile import (
     Alert,
     OutcomeShape,
@@ -503,6 +504,7 @@
     "list_datasets",
     "clear_cache",
     # Practitioner guidance
+    "agent_workflow",
     "practitioner_next_steps",
     "BusinessReport",
     "BusinessContext",
@@ -519,3 +521,69 @@
     # LLM guide accessor
     "get_llm_guide",
 ]
+
+# Agent-facing entrypoints surface first in dir(diff_diff). LLM agents
+# follow a `dir -> help -> docstring -> use` discovery loop; surfacing
+# these names first measurably improves discoverability vs the default
+# alphabetic ordering. Internal — read by tests/test_agent_discoverability.py.
+_AGENT_FACING_ORDER = (
+    "agent_workflow",
+    "profile_panel",
+    "get_llm_guide",
+    "practitioner_next_steps",
+    "BusinessReport",
+    "DiagnosticReport",
+)
+
+
+class _OrderedName(str):
+    """str subclass that sorts by _AGENT_FACING_ORDER priority.
+
+    Python's built-in dir() always sorts the result of __dir__()
+    alphabetically (CPython Objects/object.c::_dir_object unconditionally
+    calls PyList_Sort), so returning a list in our preferred order is
+    not enough. But PyList_Sort uses __lt__ for comparisons, so a str
+    subclass with a custom __lt__ can subvert the alphabetic default
+    while remaining a fully usable str for every other operation.
+
+    ALL names returned by __dir__() must be _OrderedName, not just the
+    priority head: when Python compares an _OrderedName against a plain
+    str, the reflected-method protocol prefers str's inherited __gt__
+    (because _OrderedName is a subclass of str), which sorts purely
+    alphabetically and breaks the ordering. With every element wrapped,
+    all comparisons go through this __lt__: priority head sorts to
+    front, tail (default priority 1<<30) falls through to alphabetic
+    via str.__lt__.
+    """
+
+    _ORDER = {n: i for i, n in enumerate(_AGENT_FACING_ORDER)}
+
+    def __lt__(self, other):
+        sp = self._ORDER.get(str(self), 1 << 30)
+        op = self._ORDER.get(str(other), 1 << 30)
+        if sp != op:
+            return sp < op
+        return str.__lt__(self, other)
+
+
+def __dir__():
+    """Surface agent-facing entrypoints first; remainder alphabetic.
+
+    Returns the full module namespace (matching default `dir(module)`
+    membership — keeps `__doc__`, `__name__`, etc. accessible via
+    `inspect.getmembers`) with priority names re-ordered to the head
+    via `_OrderedName`'s custom `__lt__`.
+
+    `__all__` order does not affect `dir(module)`. CPython sorts the
+    result of `__dir__()` alphabetically, so we return `_OrderedName`
+    instances (str subclass with custom `__lt__`) for every name; the
+    custom comparison routes head names to the top and falls back to
+    alphabetic for everyone else. See `_OrderedName` docstring for
+    why ALL names must be wrapped (mixing plain `str` with the
+    subclass triggers Python's reflected-method comparison protocol
+    and breaks the ordering).
+
+    `from diff_diff import *` semantics are unaffected (driven by
+    `__all__`, not by `dir()`).
+    """
+    return [_OrderedName(n) for n in globals()]
diff --git a/diff_diff/agent_workflow.py b/diff_diff/agent_workflow.py
new file mode 100644
index 00000000..98f5886e
--- /dev/null
+++ b/diff_diff/agent_workflow.py
@@ -0,0 +1,270 @@
+"""Stateless orchestrator: print the recommended diff-diff workflow with
+the caller's column names wired in.
+
+This module exists to give LLM agents a single, recognizable entrypoint
+that names the rest of the agent-facing workflow (`profile_panel`,
+`get_llm_guide`, `practitioner_next_steps`, `BusinessReport`). The
+function does not fit, inspect, or recommend — it templates a copy-
+pasteable script.
+"""
+
+from __future__ import annotations
+
+from typing import Any, Dict, List, Optional, Tuple
+
+# Pattern → df-callable estimator class names. Flat union below is the
+# `fit_candidates` field of the returned dict; each name must remain a
+# valid `hasattr(diff_diff, name)` (locked by the contract test in
+# tests/test_agent_discoverability.py and tests/test_agent_workflow.py).
+# Patterns intentionally exclude post-fit and pre-fit diagnostics
+# (PreTrendsPower takes pre-treatment coefficients, HonestDiD takes a
+# fitted results object); those are mentioned separately in the
+# templated Step 4 of the script.
+_WORKFLOW_PATTERNS: Tuple[Tuple[str, Tuple[str, ...]], ...] = (
+    (
+        "Staggered adoption + binary treatment + has_never_treated control",
+        ("CallawaySantAnna", "SunAbraham", "ImputationDiD"),
+    ),
+    (
+        "Continuous treatment dose (non-binary numeric intensity)",
+        ("ContinuousDiD",),
+    ),
+    (
+        "Heterogeneous adoption intensity across treated units",
+        ("HeterogeneousAdoptionDiD",),
+    ),
+    (
+        "Simple 2x2 DiD (binary treatment, two periods, no staggering)",
+        ("DifferenceInDifferences",),
+    ),
+)
+
+
+def _safe_kwarg(name: str, value: Optional[str]) -> Optional[str]:
+    """Render ``name=<python-literal>`` using repr() for source-safety.
+
+    Column labels containing quotes, backslashes, or other special
+    characters must not break the emitted "copy-pasteable" script.
+    Python's built-in ``repr()`` produces a valid string literal for
+    any str input (including embedded quotes / backslashes /
+    newlines), so ``f"{name}={value!r}"`` is injection-safe by
+    construction. ``None`` returns ``None`` so the caller can drop
+    the kwarg.
+    """
+    if value is None:
+        return None
+    return f"{name}={value!r}"
+
+
+def _join_kwargs(**kwargs: Optional[str]) -> str:
+    parts = [_safe_kwarg(k, v) for k, v in kwargs.items()]
+    return ", ".join(p for p in parts if p is not None)
+
+
+def agent_workflow(
+    df: Any,
+    *,
+    unit: str,
+    time: str,
+    treatment: str,
+    outcome: str,
+    first_treat: Optional[str] = None,
+    df_name: str = "df",
+    verbose: bool = True,
+) -> Dict[str, Any]:
+    """Print the recommended diff-diff workflow with your column names wired in.
+
+    Stateless orchestrator. Calls nothing internally. Returns a dict;
+    optionally prints a copy-pasteable script (``verbose=True``, the
+    default). ``df`` is not inspected — column names are templated
+    verbatim into the output.
+
+    Parameters
+    ----------
+    df : pandas.DataFrame
+        Long-format panel data. Not inspected; included so the agent
+        can pass the same handle along to the next call.
+    unit : str
+        Column identifying the cross-sectional unit.
+    time : str
+        Column identifying the time period.
+    treatment : str
+        Column holding the treatment indicator or dose.
+    outcome : str
+        Column holding the outcome variable.
+    first_treat : str, optional
+        Column with each unit's first-treatment period (or NaN for
+        never-treated controls). When supplied, the templated Step 3
+        switches from a ``DifferenceInDifferences.fit(treatment=...)``
+        example to a ``CallawaySantAnna().fit(first_treat=...)``
+        example, matching the actual fit signatures (passing
+        ``treatment=`` to CallawaySantAnna's ``.fit()`` would raise
+        TypeError).
+    df_name : str, default ``"df"``
+        Identifier under which the caller's dataframe is bound in
+        their namespace. Templated verbatim into the emitted script
+        as the first positional argument of every call
+        (``profile_panel({df_name}, ...)``,
+        ``<Estimator>().fit({df_name}, ...)``) so the script is
+        directly executable when the caller's local variable matches.
+        If the caller has ``panel = pd.read_parquet(...)``, passing
+        ``df_name="panel"`` produces a script that references
+        ``panel`` instead of ``df``. Must be a valid Python identifier
+        (not enforced; non-identifier values produce a script that
+        won't parse).
+    verbose : bool, default True
+        If True, print the script to stdout. The dict is always
+        returned regardless.
+
+    Returns
+    -------
+    dict
+        Keys:
+
+        - ``"profile_call"`` (str): call signature for
+          :func:`diff_diff.profile_panel`.
+        - ``"guide_call"`` (str): call signature for
+          :func:`diff_diff.get_llm_guide`.
+        - ``"fit_candidates"`` (list of str): flat union of estimator /
+          diagnostic class names referenced in the workflow patterns.
+          Every name resolves on the top-level ``diff_diff`` namespace.
+        - ``"validation_calls"`` (list of str): call signatures for the
+          post-fit validation step.
+        - ``"reporting_call"`` (str): call signature for
+          :class:`diff_diff.BusinessReport`.
+        - ``"script"`` (str): printable multi-line workflow.
+
+    Examples
+    --------
+    >>> import pandas as pd
+    >>> import diff_diff
+    >>> df = pd.DataFrame({
+    ...     "firm_id": [1, 1, 2, 2],
+    ...     "year": [0, 1, 0, 1],
+    ...     "treated": [0, 0, 1, 1],
+    ...     "logwage": [0.1, 0.2, 0.1, 0.9],
+    ... })
+    >>> out = diff_diff.agent_workflow(df, unit="firm_id", time="year",
+    ...                                treatment="treated", outcome="logwage",
+    ...                                verbose=False)
+    >>> "profile_panel" in out["script"]
+    True
+    """
+    del df  # intentionally unused: orchestrator templates from column names only
+
+    profile_call = (
+        f"diff_diff.profile_panel({df_name}, "
+        f"{_join_kwargs(unit=unit, time=time, treatment=treatment, outcome=outcome)})"
+    )
+    guide_call = 'diff_diff.get_llm_guide("autonomous")'
+
+    # Step 3 example: branch on first_treat presence.
+    # - With first_treat: a staggered structure is strongly implied, BUT
+    #   `first_treat` does not by itself identify which estimator to use:
+    #   CallawaySantAnna (binary staggered), ContinuousDiD (continuous-
+    #   dose with first_treat), and HeterogeneousAdoptionDiD event-study
+    #   (heterogeneous intensity with first_treat_col) all accept it.
+    #   Show CallawaySantAnna as the binary-staggered canonical example
+    #   and list the alternatives for continuous / heterogeneous designs
+    #   so an agent isn't steered to the wrong estimator.
+    # - Without first_treat: the orchestrator does not inspect df, so it
+    #   CANNOT infer whether the panel is 2x2 binary vs continuous-dose
+    #   vs heterogeneous-adoption. Show a DifferenceInDifferences call
+    #   as the "simple 2x2" example and label it explicitly conditional
+    #   on that shape.
+    if first_treat is not None:
+        fit_example_kwargs = _join_kwargs(
+            outcome=outcome, unit=unit, time=time, first_treat=first_treat
+        )
+        fit_example_call = f"diff_diff.CallawaySantAnna().fit({df_name}, {fit_example_kwargs})"
+        step3_label_lines = [
+            "Step 3 - Fit. Your data has `first_treat` -> staggered structure.",
+            "`first_treat` alone does NOT identify a single estimator; pick by",
+            "treatment shape:",
+            "  - Binary staggered  : CallawaySantAnna (shown) / SunAbraham / ImputationDiD",
+            "  - Continuous dose   : ContinuousDiD (also takes first_treat=)",
+            "  - Heterogeneous adoption intensity:",
+            "                        HeterogeneousAdoptionDiD (event study,",
+            "                        takes first_treat_col=, NOT first_treat=)",
+        ]
+    else:
+        fit_example_kwargs = _join_kwargs(
+            outcome=outcome, unit=unit, time=time, treatment=treatment
+        )
+        fit_example_call = (
+            f"diff_diff.DifferenceInDifferences().fit({df_name}, {fit_example_kwargs})"
+        )
+        step3_label_lines = [
+            "Step 3 - Fit. Pick a candidate from Step 2's patterns based on your",
+            "treatment/time shape. The example below shows the simple 2x2 case",
+            "(binary treatment + binary time); substitute ContinuousDiD /",
+            "HeterogeneousAdoptionDiD / etc. when your design is not 2x2",
+            "(DifferenceInDifferences.fit() validates and rejects non-binary",
+            "treatment or time).",
+        ]
+    step3_comment_block = "\n".join(f"# {line}" for line in step3_label_lines)
+
+    validation_calls = [
+        "diff_diff.practitioner_next_steps(result)",
+    ]
+    reporting_call = "diff_diff.BusinessReport(result).full_report()"
+
+    fit_candidates: List[str] = []
+    pattern_lines: List[str] = []
+    for label, names in _WORKFLOW_PATTERNS:
+        pattern_lines.append(f"#   - {label}")
+        pattern_lines.append(f"#       candidates: {', '.join(names)}")
+        for n in names:
+            if n not in fit_candidates:
+                fit_candidates.append(n)
+    pattern_block = "\n".join(pattern_lines)
+
+    diagnostics_block = (
+        "# Parallel-trends sensitivity / power (take a fitted result or\n"
+        "# pre-trend coefficients, NOT df+columns):\n"
+        "#     diff_diff.PreTrendsPower / diff_diff.HonestDiD"
+    )
+
+    # Templated output is a valid Python script: every prose line is a
+    # `#` comment, every code line stands at column 0 and runs as-is.
+    # Step 5 wraps full_report() in print() so end-to-end execution
+    # actually produces the stakeholder narrative.
+    script = f"""# diff_diff workflow for your data
+# =================================
+#
+# Step 1 - Describe the panel:
+profile = {profile_call}
+print(profile)
+
+# Step 2 - Choose an estimator. Consult the routing matrix:
+print({guide_call})
+
+# Routing patterns (df-callable estimators):
+{pattern_block}
+#
+{diagnostics_block}
+
+{step3_comment_block}
+result = {fit_example_call}
+
+# Step 4 - Validate:
+{validation_calls[0]}
+
+# Step 5 - Report:
+print({reporting_call})
+
+# Full reference: diff_diff.get_llm_guide("full")
+# Practitioner recipe: diff_diff.get_llm_guide("practitioner")
+"""
+
+    if verbose:
+        print(script)
+
+    return {
+        "profile_call": profile_call,
+        "guide_call": guide_call,
+        "fit_candidates": fit_candidates,
+        "validation_calls": validation_calls,
+        "reporting_call": reporting_call,
+        "script": script,
+    }
diff --git a/diff_diff/guides/llms-autonomous.txt b/diff_diff/guides/llms-autonomous.txt
index a320ac71..197dd827 100644
--- a/diff_diff/guides/llms-autonomous.txt
+++ b/diff_diff/guides/llms-autonomous.txt
@@ -10,6 +10,12 @@ It is a reference, not a decision tree. Multiple estimators usually fit a
 given panel; choosing between them involves trade-offs the cited literature
 discusses and that this guide does not pretend to resolve.
 
+**Recommended starting call:**
+- `agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` -
+  prints the canonical 5-step workflow with your column names wired in:
+  `profile_panel` → this guide → `<Estimator>.fit` → `practitioner_next_steps`
+  → `BusinessReport`. Stateless; calls nothing internally.
+
 **Pair this guide with:**
 - `get_llm_guide("practitioner")` - the Baker et al. (2025) 8-step validation
   workflow in workflow-prose form.
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index a310d621..777e6eeb 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -10,6 +10,10 @@ diff-diff offers 17 estimators covering basic 2x2 DiD, modern staggered adoption
 - Source: https://github.com/igerber/diff-diff
 - Docs: https://diff-diff.readthedocs.io/en/stable/
 
+## Agent Quickstart
+
+LLM agents: call `diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` first. It prints the recommended 5-step workflow (`profile_panel` → `get_llm_guide` → `<Estimator>().fit` → `practitioner_next_steps` → `BusinessReport`) with your column names wired in.
+
 ## Practitioner Workflow (based on Baker et al. 2025)
 
 IMPORTANT: For rigorous DiD analysis, follow these 8 steps. Skipping
diff --git a/tests/test_agent_workflow.py b/tests/test_agent_workflow.py
new file mode 100644
index 00000000..a0c4218f
--- /dev/null
+++ b/tests/test_agent_workflow.py
@@ -0,0 +1,442 @@
+"""Tests for the stateless agent_workflow() orchestrator.
+
+Mirrors the content-stability pattern from tests/test_guides.py: assert
+fingerprint strings appear in the output rather than pinning exact
+formatting. The orchestrator's stability contract is that it names the
+five canonical workflow primitives in a copy-pasteable script.
+"""
+
+import pandas as pd
+import pytest
+
+import diff_diff
+
+
+@pytest.fixture
+def df():
+    return pd.DataFrame(
+        {
+            "firm_id": [1, 1, 2, 2],
+            "year": [0, 1, 0, 1],
+            "treated": [0, 0, 1, 1],
+            "logwage": [0.1, 0.2, 0.1, 0.9],
+        }
+    )
+
+
+def test_returns_dict_with_canonical_keys(df):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    expected = {
+        "profile_call",
+        "guide_call",
+        "fit_candidates",
+        "validation_calls",
+        "reporting_call",
+        "script",
+    }
+    assert expected <= set(out.keys())
+
+
+def test_script_names_canonical_workflow(df):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    script = out["script"]
+    for name in (
+        "profile_panel",
+        "get_llm_guide",
+        "practitioner_next_steps",
+        "BusinessReport",
+    ):
+        assert name in script, f"{name!r} missing from script"
+
+
+def test_templates_column_names(df):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        first_treat="cohort",
+        verbose=False,
+    )
+    script = out["script"]
+    for col in ("firm_id", "year", "treated", "logwage", "cohort"):
+        assert col in script, f"column {col!r} missing from templated script"
+
+
+def test_first_treat_omitted_when_none(df):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    # Without first_treat=, the templated Step 3 should NOT mention it.
+    assert "first_treat=" not in out["script"]
+
+
+def test_first_treat_appears_when_provided(df):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        first_treat="cohort_year",
+        verbose=False,
+    )
+    # repr() produces single-quoted string literals for simple labels.
+    assert "first_treat='cohort_year'" in out["script"]
+
+
+def test_first_treat_switches_step3_estimator(df):
+    """Step 3 must showcase a fit signature compatible with the data shape.
+
+    - first_treat=None  -> DifferenceInDifferences (takes `treatment=`,
+      does NOT take `first_treat=`)
+    - first_treat=<col> -> CallawaySantAnna (takes `first_treat=`,
+      does NOT take `treatment=`)
+    """
+    no_ft = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    assert "DifferenceInDifferences" in no_ft["script"]
+    assert "diff_diff.CallawaySantAnna().fit" not in no_ft["script"]
+
+    with_ft = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        first_treat="cohort",
+        verbose=False,
+    )
+    assert "diff_diff.CallawaySantAnna().fit" in with_ft["script"]
+    # Staggered fit must not pass `treatment=` (would TypeError).
+    step3_lines = [
+        line for line in with_ft["script"].split("\n") if "CallawaySantAnna().fit" in line
+    ]
+    assert step3_lines, "Step 3 line missing"
+    assert "treatment=" not in step3_lines[0]
+
+
+def test_no_first_treat_step3_does_not_overclaim_match(df):
+    """Without `first_treat`, the orchestrator cannot infer panel shape.
+
+    The Step 3 label must NOT claim the emitted DiD example is "matched to
+    your data shape" — for continuous-dose or heterogeneous-adoption
+    designs without first_treat, DifferenceInDifferences would reject at
+    fit time. The label must instead frame the example as conditional on
+    the simple 2x2 case and tell the agent to substitute the matching
+    candidate from Step 2 for other shapes.
+    """
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    script = out["script"]
+    # Negative: should not claim universal match.
+    assert "matched to your data shape" not in script
+    # Positive: must qualify with the 2x2 conditionality + substitution hint.
+    assert "2x2" in script
+    assert "substitute" in script.lower()
+    # Other workflow patterns must remain enumerated so the agent can substitute.
+    for name in ("ContinuousDiD", "HeterogeneousAdoptionDiD"):
+        assert name in script, f"{name} routing pattern missing from Step 2 hints"
+
+
+def test_first_treat_step3_names_non_binary_alternatives(df):
+    """first_treat alone doesn't pick the estimator.
+
+    ContinuousDiD.fit and HeterogeneousAdoptionDiD (event-study) BOTH take
+    `first_treat` (via `first_treat=` and `first_treat_col=`). The Step 3
+    label must name those alternatives so an agent on a continuous-dose or
+    heterogeneous-intensity panel isn't silently steered to CS21.
+    """
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        first_treat="cohort",
+        verbose=False,
+    )
+    script = out["script"]
+    # The CS21 example remains the canonical binary-staggered demo.
+    assert "diff_diff.CallawaySantAnna().fit" in script
+    # The Step 3 commentary must name the non-binary alternatives so the
+    # agent knows when to switch.
+    assert "ContinuousDiD" in script
+    assert "HeterogeneousAdoptionDiD" in script
+    # The HAD distinction (first_treat_col vs first_treat) must be named
+    # so the agent doesn't try to pass first_treat= to HAD's event study.
+    assert "first_treat_col" in script
+
+
+def test_emitted_script_parses_as_python_module(df):
+    """The "script" output must parse as a complete Python module.
+
+    Prior contract was "copy-pasteable" but the script had bare prose
+    lines (`Step 1 - ...`) that would SyntaxError on execution. This
+    locks the full-script parseability so the contract stays honest.
+    """
+    import ast
+
+    for ft in (None, "cohort"):
+        out = diff_diff.agent_workflow(
+            df,
+            unit="firm_id",
+            time="year",
+            treatment="treated",
+            outcome="logwage",
+            first_treat=ft,
+            verbose=False,
+        )
+        # ast.parse with default mode='exec' verifies the WHOLE script
+        # parses as a Python module, not just individual call expressions.
+        ast.parse(out["script"])
+
+
+def test_emitted_script_prints_report(df):
+    """Step 5 must wrap BusinessReport in print() so end-to-end script
+    execution actually produces the stakeholder narrative. full_report()
+    returns a str; the previous template discarded it.
+    """
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    script = out["script"]
+    assert "print(diff_diff.BusinessReport" in script
+    assert ".full_report())" in script
+
+
+def test_df_name_templates_into_script(df):
+    """Caller can rename the dataframe symbol in the emitted script.
+
+    Default (df_name="df"): script references `df`.
+    Custom (df_name="panel"): every emitted call uses `panel` and no
+    bare `df` identifier appears in the runnable code paths.
+    """
+    import ast
+
+    out_default = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    out_panel = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        df_name="panel",
+        verbose=False,
+    )
+    # Default behavior preserved.
+    assert "profile_panel(df," in out_default["script"]
+    # Custom name flows through profile_call AND fit_example_call.
+    assert "profile_panel(panel," in out_panel["script"]
+    assert ".fit(panel," in out_panel["script"]
+    # Static reference scan: parse the panel-script and confirm no `df`
+    # Name node exists — catches template drift where a `df` reference
+    # slips in outside the templated points.
+    tree = ast.parse(out_panel["script"])
+    names = {n.id for n in ast.walk(tree) if isinstance(n, ast.Name)}
+    assert "df" not in names, (
+        f"emitted script with df_name='panel' still references `df`: "
+        f"identifier names found = {sorted(names)}"
+    )
+    assert "panel" in names
+
+
+def test_df_name_panel_script_executes_in_panel_namespace(df):
+    """The emitted script must resolve all names in a namespace where
+    `panel` exists and `df` does not. We stub out `diff_diff` with a
+    MagicMock so calls don't actually fit; the test is purely about
+    symbol resolution, not numerical correctness — if the script still
+    referenced `df` anywhere in runnable code, exec() would NameError.
+    """
+    import unittest.mock
+
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        df_name="panel",
+        verbose=False,
+    )
+    ns = {
+        "diff_diff": unittest.mock.MagicMock(),
+        "panel": "sentinel_df_object",
+        # Deliberately no `df` key — script must not reference it.
+    }
+    exec(compile(out["script"], "<test_df_name>", "exec"), ns)
+
+
+def test_does_not_inspect_df():
+    # Pure orchestrator: a structurally-empty DataFrame must still produce
+    # the templated script (no df inspection happens).
+    out = diff_diff.agent_workflow(
+        pd.DataFrame(),
+        unit="a",
+        time="b",
+        treatment="c",
+        outcome="d",
+        verbose=False,
+    )
+    assert "profile_panel" in out["script"]
+    # repr() produces single-quoted literals.
+    assert "unit='a'" in out["script"]
+
+
+def test_emitted_calls_are_valid_python():
+    """The advertised "copy-pasteable" script must actually parse as Python.
+
+    Walks each line starting with `profile =` or `result =` and asserts the
+    RHS parses with ast.parse(..., mode='eval'). Guards against future
+    template drift that would silently emit invalid syntax.
+    """
+    import ast
+
+    base = pd.DataFrame({"u": [1], "t": [0], "tr": [0], "y": [0.0]})
+    for ft in (None, "cohort_col"):
+        out = diff_diff.agent_workflow(
+            base,
+            unit="u",
+            time="t",
+            treatment="tr",
+            outcome="y",
+            first_treat=ft,
+            verbose=False,
+        )
+        rhs_lines = []
+        for line in out["script"].split("\n"):
+            s = line.strip()
+            if s.startswith("profile =") or s.startswith("result ="):
+                rhs_lines.append(s[s.index("=") + 1 :].strip())
+        assert rhs_lines, f"no parseable call lines emitted (first_treat={ft})"
+        for rhs in rhs_lines:
+            ast.parse(rhs, mode="eval")
+
+
+@pytest.mark.parametrize(
+    "label",
+    [
+        'firm"id',  # embedded double quote
+        "year'col",  # embedded single quote
+        "name\\with\\slash",  # backslashes
+        "x\\nname",  # backslash-n (not a real newline)
+        'unit"); evil()  #',  # injection attempt
+        "with space",  # whitespace
+    ],
+)
+def test_adversarial_column_labels_produce_valid_python(label):
+    """Any str column label must produce a script that parses as Python.
+
+    Uses repr() under the hood, so any input str becomes a valid Python
+    string literal in the templated output. Locks the P0 contract that
+    column names can never inject statements into the "copy-pasteable"
+    script.
+    """
+    import ast
+
+    df_local = pd.DataFrame({label: [1]} if " " not in label else {"u": [1]})
+    out = diff_diff.agent_workflow(
+        df_local,
+        unit=label,
+        time="t",
+        treatment="tr",
+        outcome="y",
+        verbose=False,
+    )
+    for line in out["script"].split("\n"):
+        s = line.strip()
+        if s.startswith("profile =") or s.startswith("result ="):
+            rhs = s[s.index("=") + 1 :].strip()
+            ast.parse(rhs, mode="eval")
+
+
+def test_fit_candidates_all_importable(df):
+    """Every estimator name in fit_candidates must remain importable.
+
+    Catches the drift case where an estimator is renamed but the
+    orchestrator's candidates list still references the old name.
+    """
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    for name in out["fit_candidates"]:
+        assert hasattr(diff_diff, name), (
+            f"agent_workflow advertises {name!r} but it's not on the "
+            f"public surface — rename detected without orchestrator update."
+        )
+
+
+def test_verbose_true_prints_script(df, capsys):
+    out = diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=True,
+    )
+    captured = capsys.readouterr()
+    assert "profile_panel" in captured.out
+    assert out["script"] in captured.out
+
+
+def test_verbose_false_silent(df, capsys):
+    diff_diff.agent_workflow(
+        df,
+        unit="firm_id",
+        time="year",
+        treatment="treated",
+        outcome="logwage",
+        verbose=False,
+    )
+    captured = capsys.readouterr()
+    assert captured.out == ""