Python: Adding AgentFileStore and FileAccessProvider to support file access operations.#6099
Python: Adding AgentFileStore and FileAccessProvider to support file access operations.#6099alliscode wants to merge 5 commits into
Conversation
…rations for agents.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Python “file access harness” for Agent Framework, enabling agents to read/write/list/search files within a controlled storage root via a new FileAccessProvider backed by AgentFileStore implementations, along with tests and a usage sample.
Changes:
- Added file-access harness core (
AgentFileStore, in-memory and filesystem stores, search DTOs) and exported the new API surface. - Added a
FileAccessProviderthat registersfile_access_*tools and default instructions per run. - Added comprehensive unit tests and a new
file_access_data_processingsample + documentation updates.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Lockfile update for github-copilot-sdk specifier ordering. |
| python/scripts/task_runner.py | Windows stdout/stderr UTF-8 reconfiguration before importing rich. |
| python/samples/02-agents/context_providers/README.md | Documented the new file-access sample and prerequisites. |
| python/samples/02-agents/context_providers/file_access_data_processing/README.md | New sample README describing scripted file-access workflow. |
| python/samples/02-agents/context_providers/file_access_data_processing/data_processing.py | New runnable sample wiring FileAccessProvider + FileSystemAgentFileStore. |
| python/samples/02-agents/context_providers/file_access_data_processing/working/sales.csv | Sample input dataset. |
| python/samples/02-agents/context_providers/file_access_data_processing/working/region_totals.md | Checked-in sample report artifact. |
| python/packages/core/AGENTS.md | Added file-access harness documentation section. |
| python/packages/core/agent_framework/_harness/_file_access.py | New harness implementation (stores, search DTOs, provider tools, path protections). |
| python/packages/core/agent_framework/init.py | Exported new file-access harness classes/constants in the public API. |
| python/packages/core/tests/core/test_harness_file_access.py | New unit tests covering normalization, stores, search, and provider tool flows. |
Copilot's findings
- Files reviewed: 10/11 changed files
- Comments generated: 5
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 82%
✓ Security Reliability
The file-access harness has solid path-traversal and symlink protections. The main security concern is that
search_filescompiles LM-provided regex patterns viare.compile()without any timeout or complexity guard. ForInMemoryAgentFileStore, matching runs directly on the asyncio event loop, so a catastrophic backtracking pattern (e.g.,(a+)+$) would freeze the entire application. The framework does catchre.errorfrom invalid regex gracefully (verified at_tools.py:1539), but CPU-bound backtracking is not an exception—it's an unbounded hang. The code comment claims this 'matches the existing approach in_memory.py', but_memory.py:913uses O(n) substring matching, not regex, so the risk profiles differ significantly.
✓ Test Coverage
The test suite is comprehensive for happy paths, security boundaries (path traversal, symlinks), serialization, and provider integration. Two notable coverage gaps exist: (1) no test verifies behavior when an invalid regex pattern is passed to
search_files(which raisesre.errorfromre.compile), and (2) no test coversFileSystemAgentFileStore._search_files_syncencountering a non-UTF-8 file (which crashes withUnicodeDecodeErrorand aborts the entire search). The agent loop's generic exception handler mitigates runtime risk for (1), but (2) is a latent bug since a single binary file in the directory would prevent searching any other files.
✗ Design Approach
I found one design issue in the new sample: it checks in the report file that the sample is supposed to create, so the scripted flow no longer exercises the advertised create/save behavior from a clean checkout.
Flagged Issues
- The sample checks in the output artifact
working/region_totals.mdthat the scripted conversation (data_processing.py:68-72) asks the agent to create. Becausefile_access_save_filedefaults to no-overwrite, a fresh checkout will hit the 'already exists' path instead of demonstrating report creation, and the initial 'What files do you have access to?' turn no longer matches the README's statedsales.csv-only starting state. Remove the checked-in output or change the sample to write a different filename.
Automated review by alliscode's agents
- Probe symlinks on the unresolved candidate path so in-root symlinks cannot silently pass and out-of-root symlinks surface the correct error message. - Validate matching_lines elements in FileSearchResult.from_dict and raise a clean ValueError for non-mapping entries. - Cap search regex pattern length (256 chars) via a new _compile_search_regex helper to mitigate ReDoS, and surface the cap in the file_access_search_files tool description. - Skip non-UTF-8 files during filesystem search instead of aborting the entire directory walk. - Replace the module-scope trailing string in the data-processing sample with comments to avoid Ruff B018. - Remove the checked-in working/region_totals.md sample artifact so the save flow works from a clean checkout. - Expand the Windows stdout reconfiguration comment in task_runner.py for clarity. - Add tests for invalid/oversize regex, non-UTF-8 file search, and in-root symlink rejection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
Use cast(list[object], ...) instead of cast(list[Any], ...) so the cast represents a real type change (lists are invariant) and is no longer flagged by mypy as redundant, while still satisfying pyright's reportUnknownVariableType. Matches the existing pattern in _memory.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- _normalize_relative_path now strips surrounding whitespace up front so leading/trailing spaces never leak into file segments, and rejects trailing path separators for file paths so 'foo/' is no longer silently coerced to 'foo'. - FileSystemAgentFileStore._resolve_safe_directory_path normalizes with is_directory=True and maps an empty normalized result to the root. This matches InMemoryAgentFileStore so whitespace-only directory inputs resolve to the root instead of raising. - Added tests for whitespace stripping, trailing-separator rejection, and whitespace-only directory listing on the filesystem store. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop. - Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store. - Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection. - Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`. - `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This pull request introduces a new file access harness to the agent framework, making it possible for agents to read, write, list, and search files within a controlled folder. The core includes new classes and providers for file access, comprehensive tests, and a new sample demonstrating how to use these capabilities in a data processing scenario. The changes are grouped below by theme.
File Access Harness and Core Additions:
AgentFileStore,InMemoryAgentFileStore, andFileSystemAgentFileStoreclasses to provide async file storage interfaces, with secure path normalization and root containment. [1] [2] [3] [4] [5] [6]FileSearchResultandFileSearchMatchDTOs for structured file search results, and exposed them in the public API. [1] [2] [3]FileAccessProvider, a context provider that exposes file access tools (file_access_save_file,file_access_read_file, etc.) to agents, with shared store support. [1] [2] [3]Testing:
FileAccessProvidertool flows.Samples and Documentation:
file_access_data_processing/showing how to useFileAccessProviderandFileSystemAgentFileStorefor agent-driven data file processing, with an accompanying README and sample data. [1] [2] [3]These changes significantly enhance the agent framework's ability to work with files in a secure, extensible way, and provide clear guidance and tests for users and developers.
Contribution Checklist