fix(agentex): redact secret values from all logs via logging filter by danielmillerp · Pull Request #294 · scaleapi/scale-agentex

danielmillerp · 2026-06-09T19:26:51Z

Problem

A request principal context can carry a raw api_key (a long-lived bearer credential). #293 removed the principal context from the authorization-service logs and hardened the request-body scrubber — but that's call-site-by-call-site. The leak originally happened because a perfectly ordinary logger.info(..., principal_context) interpolated a secret-bearing object. Nothing structurally prevents the next such line.

Fix

Add a SensitiveDataFilter to every logger created by make_logger (agentex/src/utils/logging.py). It rewrites the fully-rendered log message before any handler emits it, masking the value that follows any sensitive key name to [REDACTED]:

Key names reuse REQUEST_KEY_REGEXP_BLACKLIST (the request-body scrubber's list) — single source of truth: api_key / token / secret / cookie / authorization / acting-user (incl. camelCase apiKey).
Handles dict-repr ('api_key': 'x'), kwargs/model repr (api_key='x'), and JSON ("api_key": "x"), preserving quote style.
Attached to the logger (not the handler) so redaction runs once, up front, for every downstream handler the record propagates to.

This is defense-in-depth layered on top of #293: call sites should still avoid logging secrets, but if one slips through in the future, the value never reaches log aggregation.

Scope / limitations

Scrubs message text only. Structured extra={...} fields are out of scope and should continue to use non-sensitive identifiers (e.g. api_key_id, not the key).
Regex is best-effort (may over-redact a benign token: foo); it's a safety net, not a substitute for not logging secrets. The org-wide "all logs everywhere" backstop (Datadog Sensitive Data Scanner) is a separate infra follow-up.

Tests

tests/unit/utils/test_logging_redaction.py:

redacts the exact principal-dict shape that leaked; preserves user_id / account_id
redacts kwargs / JSON / unquoted / camelCase / token / secret / authorization / cookie renderings
leaves non-sensitive text untouched
end-to-end: a make_logger logger masks the secret in captured output while keeping safe identifiers

uv run ruff check/format clean; pytest for the new file + #293's logging/request_utils tests = 15 passed.

🤖 Generated with Claude Code

Greptile Summary

Adds a SensitiveDataFilter (attached at the logger level in make_logger) that rewrites rendered log messages to mask values following any key in REQUEST_KEY_REGEXP_BLACKLIST before any handler emits the record. The implementation is defense-in-depth on top of #293's call-site hardening.

logging.py: New redact_sensitive_text helper and SensitiveDataFilter class share the existing REQUEST_KEY_REGEXP_BLACKLIST as the single source of truth; the compiled regex handles dict-repr, kwargs, JSON, and Bearer-prefixed authorization headers. The filter is a module-level singleton added idempotently to each make_logger logger.
test_logging_redaction.py: 15 test cases cover the exact leaked principal shape, 10 common rendering variants (including camelCase, unquoted, and scheme-prefixed), a non-sensitive text preservation check, an over-redaction boundary test, and an end-to-end make_logger/caplog integration test.

Confidence Score: 5/5

Safe to merge. The filter logic is correct, redaction is idempotent, and the regex handles all common credential-bearing log shapes including Bearer-prefixed headers.

The filter attaches cleanly to the logger before any handler runs, modifies record.msg and record.args in place so both plain and JSON formatters see the redacted text, and the idempotency check prevents double-attaching on repeated make_logger calls for the same name. No functional defect was found in the changed files.

No files require special attention.

Important Files Changed

Filename	Overview
agentex/src/utils/logging.py	Adds SensitiveDataFilter and redact_sensitive_text; logically correct, idempotent singleton attachment; minor: redact_sensitive_text called outside the try/except guard.
agentex/tests/unit/utils/test_logging_redaction.py	New unit tests covering principal dict leak shape, 10 rendering variants, non-sensitive preservation, over-redaction boundary, and end-to-end caplog test; password and acting-user keys not explicitly parametrized.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
agentex/src/utils/logging.py:76-86
The `try/except` around `getMessage()` states "Never let redaction break logging itself", but `redact_sensitive_text` is called outside that guard. If `re.sub` somehow raises (e.g., hitting a recursion-depth limit on a pathological log string), the exception propagates through the filter and crashes the calling log statement — the exact failure mode the comment promises to prevent.

```suggestion
    def filter(self, record: logging.LogRecord) -> bool:
        try:
            message = record.getMessage()
            redacted = redact_sensitive_text(message)
        except Exception:
            # Never let redaction break logging itself.
            return True
        if redacted != message:
            record.msg = redacted
            record.args = ()
        return True
```

### Issue 2 of 2
agentex/tests/unit/utils/test_logging_redaction.py:34-50
`password` and `acting-user` are both in `REQUEST_KEY_REGEXP_BLACKLIST` and therefore in the compiled regex, but neither appears in the parametrize list. A future edit that accidentally drops one of those patterns from the blacklist would leave these redaction paths silently broken with no test to catch it.

_{Reviews (3): Last reviewed commit: "fix(agentex): redact secret values from ..." | Re-trigger Greptile}

Defense in depth on top of removing the principal context from authz logs (#293): add a SensitiveDataFilter to every logger created by make_logger so that if any future code path interpolates an object carrying a credential (e.g. a principal context dict with an api_key) into a log message, the value is masked before the record is emitted. The filter rewrites the fully-rendered message, masking the value that follows any sensitive key name (api_key / token / secret / cookie / authorization / acting-user) in dict-repr, kwargs, and JSON renderings. Key names reuse REQUEST_KEY_REGEXP_BLACKLIST so the message scrubber and the request-body scrubber stay in sync. Attaching the filter to the logger (not the handler) means redaction runs once, up front, for every downstream handler the record propagates to. Scope: scrubs message text; structured extra={} fields are out of scope and should continue to use non-sensitive identifiers (e.g. api_key_id). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

danielmillerp requested a review from a team as a code owner June 9, 2026 19:26

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread agentex/tests/unit/services/test_authorization_service.py Outdated

danielmillerp force-pushed the dm/redact-principal-context-logs branch from 3706ff6 to 510a5e6 Compare June 9, 2026 19:48

danielmillerp changed the title ~~fix(agentex): redact principal context in authorization logs~~ fix(agentex): redact secret values from all logs via logging filter Jun 9, 2026

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread agentex/src/utils/logging.py

danielmillerp force-pushed the dm/redact-principal-context-logs branch from 510a5e6 to 05d2558 Compare June 9, 2026 19:59

danielmillerp merged commit db35810 into main Jun 9, 2026
30 checks passed

danielmillerp deleted the dm/redact-principal-context-logs branch June 9, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agentex): redact secret values from all logs via logging filter#294

fix(agentex): redact secret values from all logs via logging filter#294
danielmillerp merged 1 commit into
mainfrom
dm/redact-principal-context-logs

danielmillerp commented Jun 9, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielmillerp commented Jun 9, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Scope / limitations

Tests

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danielmillerp commented Jun 9, 2026 •

edited by greptile-apps Bot

Loading