Skip to content

docs: add responsible AI agent use & token-budget guide#17

Open
oto-macenauer-absa wants to merge 6 commits into
masterfrom
docs/responsible-agent-use
Open

docs: add responsible AI agent use & token-budget guide#17
oto-macenauer-absa wants to merge 6 commits into
masterfrom
docs/responsible-agent-use

Conversation

@oto-macenauer-absa

Copy link
Copy Markdown
Contributor

What

Adds docs/responsible-agent-use.md — a guide to using AI agents (primarily GitHub Copilot) without wasting the token / credit budget — and deduplicates overlapping content in the existing docs.

Why

Copilot moved to usage-based, per-token billing (June 1 2026). Without guidance, a user can burn a month of credits in a few agent prompts. There was no doc explaining the cost model or how to control it.

Contents of the new page

  • How the cost works — per-token billing with the corrected model: (input × input_rate) + (output × output_rate) + (cached × cached_rate).
  • Token types — input vs output vs cached and why their rates differ (output ~2–6× input; cached input ~90% off reads; Anthropic cache-write premium), plus how Copilot meters all three into AI Credits.
  • Where the budget goes — context size, model choice, agent mode, MCP servers, code review.
  • Context — maintaining vs clearing, reconciled with prompt-cache behaviour.
  • MCP / plugin / skill cost discipline.
  • Per-token-type cuts + a must-do checklist.

Dedup

  • Split the "skill not activating" / activation-signal explanation: concept lives in getting-started.md, the fix in troubleshooting.md, each cross-references the other instead of restating.
  • responsible-agent-use.md references the loading model rather than re-explaining it.
  • Added the new page to the docs/ index.

Notes

  • Claims are web-verified across multiple sources (GitHub billing docs/blog, Anthropic & OpenAI prompt-caching docs, an independent pricing comparison) — see the page's Sources section.
  • The website auto-merges docs/ at build time (guidelines/ is generated, gitignored), so no website code changes are needed.
  • Based on feature/website to keep this diff scoped to the docs work.

Closes #16

Add docs/responsible-agent-use.md: a guide to using AI agents
(primarily GitHub Copilot) without wasting the token/credit budget.

Covers Copilot usage-based billing, the input/output/cached token
types and their different rates, where budget goes (context, model
choice, agent mode, MCP, code review), context maintaining vs
clearing reconciled with prompt-cache behaviour, MCP/plugin/skill
cost discipline, per-token-type cuts, and a must-do checklist.

Also deduplicate the skill-activation explanation across
getting-started and troubleshooting via cross-references, and add
the new page to the docs index.

Closes #16
@oto-macenauer-absa oto-macenauer-absa added the documentation Improvements or additions to documentation label Jun 18, 2026
@oto-macenauer-absa oto-macenauer-absa added the documentation Improvements or additions to documentation label Jun 18, 2026
Reframe the recommendations for agent-mode work (the default) rather
than manual chat: scope what the agent reads instead of pasting files,
favour targeted edits over "ask for a diff", bound the loop with a stop
condition, plan before editing, and run targeted tests so verification
output doesn't bloat context each turn. Update the token-type notes,
levers table, checklist, and TL;DR to match.

@miroslavpojer miroslavpojer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I am missing one concept which can protect scope size and still provide a lot of agent work - usage of sub-agents.

Address PR review feedback: cover sub-agents as a way to protect the
main context. A sub-agent runs in its own context window and returns
only a summary, so heavy read/search/analysis tokens are paid once and
discarded instead of accumulating in the main thread. Includes the
honest caveat that multi-agent use costs more tokens overall (Anthropic
reports ~4x single-agent, ~15x multi-agent vs chat), so delegation pays
off only when it replaces context that would otherwise pile up or when
workers run a cheaper model. Adds checklist item, TL;DR clause, sources.
@oto-macenauer-absa

Copy link
Copy Markdown
Contributor Author

Good call — added a Sub-agents section (commit 7ebce76).

It covers exactly your point: a sub-agent runs in its own context window, does the heavy reading/searching, and returns only a summary to the main thread, so the intermediate noise never pollutes the parent context.

I also web-researched it and added the honest caveat: sub-agents aren't free — each opens its own window, and Anthropic reports a single agent uses ~4x a chat's tokens and multi-agent ~15x. So the section frames delegation as a net win only when it replaces context that would otherwise pile up in the main thread, or when workers run a cheaper model (reported 5-10x cuts with a premium orchestrator + lightweight workers). Sources added to the doc.

Thanks for the video reference.

Apply fixes from a full read-through:
- intro now lists agent mode and sub-agents (the two largest sections)
- add an availability caveat to the sub-agents section: native spawning
  is a Claude Code/SDK feature; in Copilot it surfaces via custom agents,
  and the cited token figures come from the Claude ecosystem
- standardise terminology on "session" (was a mix of conversation /
  chat / thread for the same thing)
@oto-macenauer-absa oto-macenauer-absa changed the base branch from feature/website to master June 19, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants