Skip to content

Tolerate unreadable directories and dedupe globs in linear time#6663

Open
Chupik wants to merge 1 commit into
realm:mainfrom
Chupik:fix/glob-expandglobstar-tolerance
Open

Tolerate unreadable directories and dedupe globs in linear time#6663
Chupik wants to merge 1 commit into
realm:mainfrom
Chupik:fix/glob-expandglobstar-tolerance

Conversation

@Chupik

@Chupik Chupik commented May 25, 2026

Copy link
Copy Markdown

Introduction

We noticed that SwiftLint sometimes has problems with our very large repository (50,000+ files). The problem is that SwiftLint occasionally fails to enumerate the files to lint. We finally reproduced the problem and used an LLM to investigate and fix it. As a side benefit, the LLM also identified a way to significantly improve linting performance - see the Performance section below. The explanation below was drafted with LLM assistance.

Problem

Glob.expandGlobstar uses subpathsOfDirectory(atPath:) to discover nested directories before per-directory globbing. That API is all-or-nothing: it throws on the first item it cannot access, and the catch block discards every directory collected so far. After the throw, only the search root remains in directories, so the rest of expandGlobstar globs nothing more than the root pattern — every nested file is silently ignored.

Then in Glob.resolveGlob, matched paths are deduped with Array.unique, whose Equatable overload is O(n²) (Array.contains linear scan per element). On a 50k-file project that's ~2.5 billion string comparisons; in practice it effectively hangs.

So a large project with a ** glob and any unreadable subdirectory (permission denied, dangling symlink, file removed mid-scan in CI) is hit by both bugs: most files are dropped, and the few that remain take very long to dedupe.

Fix

  1. Replace subpathsOfDirectory(atPath:) with a lazy FileManager.enumerator(at:includingPropertiesForKeys:options:errorHandler:). The errorHandler returns true, so a single unreadable entry skips only itself rather than aborting the whole scan. Removed the now-unused Glob.isDirectory(path:) helper since .isDirectoryKey replaces its job (and is pre-cached by the enumerator).
  2. Add a Hashable overload of Array.unique (linear time via Set.insert). Swift's overload resolution selects it automatically wherever the element type is Hashable, so every existing call site gets the speed-up with no other code changes — including the call inside Glob.resolveGlob that motivated this.

Tests

testGlobstarToleratesUnreadableSubdirectory in Tests/FileSystemAccessTests/GlobTests: creates a temp tree, chmods one subdirectory 000, runs Glob.resolveGlob("**/*.swift"), asserts that siblings of the unreadable subtree still resolve while the subtree itself stays inaccessible. Skipped under root since permission-bit tests can't exercise the tolerance fix as root.

Performance

Measured on an internal iOS monorepo (58 133 .swift files; 43 643 of them lintable after excluded: filtering) using swiftlint lint --no-cache --quiet --only-rule hardcoded_localizable_string --write-baseline …. Runs alternated between the two binaries to even out OS page-cache effects.

Run This branch realm/SwiftLint 0.59.1 Speedup
1 52.93 s 83.74 s 1.58×
2 51.36 s 82.33 s 1.60×
3 45.37 s 79.58 s 1.75×
mean 49.89 s 81.88 s 1.64×
min 45.37 s 79.58 s 1.75×
max 52.93 s 83.74 s 1.58×

Baseline output is byte-identical between the two binaries, so the speedup is on the same workload (not a case of the new build doing less work):

Metric New Old
Baseline JSON size 3 413 444 B 3 413 444 B
Violations reported 6 072 6 072
Distinct files with violations 1 750 1 750

Notes for review

  • Symlink behavior (intentional, worth flagging): the URL enumerator does not descend into symlinks to directories, whereas subpathsOfDirectory did. For glob expansion this is generally desirable — no symlink-cycle hangs, no double-linting — and matches the direction this codebase has already taken in FileManager.collectFiles. Happy to change if you'd rather preserve the old following behavior.
  • Array.unique public surface: this adds a new computed property on Array where Element: Hashable alongside the existing Array where Element: Equatable. Overload resolution picks the Hashable one transparently, so no source breakage. If you'd prefer to scope the fix to a private Set-based dedup in Glob.swift and leave Array.unique untouched, that's a one-line alternative — let me know.

@SwiftLintBot

SwiftLintBot commented May 25, 2026

Copy link
Copy Markdown
19 Messages
📖 Building this branch resulted in a binary size of 27461.41 KiB vs 27458.24 KiB when built on main (0% larger).
📖 Linting Aerial with this PR took 0.73 s vs 0.76 s on main (3% faster).
📖 Linting Alamofire with this PR took 1.04 s vs 1.05 s on main (0% faster).
📖 Linting Brave with this PR took 7.12 s vs 7.08 s on main (0% slower).
📖 Linting DuckDuckGo with this PR took 28.81 s vs 28.76 s on main (0% slower).
📖 Linting Firefox with this PR took 12.11 s vs 12.07 s on main (0% slower).
📖 Linting Kickstarter with this PR took 8.23 s vs 8.22 s on main (0% slower).
📖 Linting Moya with this PR took 0.42 s vs 0.42 s on main (0% slower).
📖 Linting NetNewsWire with this PR took 2.77 s vs 2.72 s on main (1% slower).
📖 Linting Nimble with this PR took 0.66 s vs 0.62 s on main (6% slower).
📖 Linting PocketCasts with this PR took 7.99 s vs 7.94 s on main (0% slower).
📖 Linting Quick with this PR took 0.4 s vs 0.42 s on main (4% faster).
📖 Linting Realm with this PR took 2.83 s vs 2.89 s on main (2% faster).
📖 Linting Sourcery with this PR took 1.84 s vs 1.79 s on main (2% slower).
📖 Linting Swift with this PR took 4.64 s vs 4.6 s on main (0% slower).
📖 Linting SwiftLintPerformanceTests with this PR took 0.28 s vs 0.32 s on main (12% faster).
📖 Linting VLC with this PR took 1.18 s vs 1.15 s on main (2% slower).
📖 Linting Wire with this PR took 18.45 s vs 18.37 s on main (0% slower).
📖 Linting WordPress with this PR took 12.4 s vs 12.36 s on main (0% slower).

Generated by 🚫 Danger

`Glob.expandGlobstar` used `subpathsOfDirectory(atPath:)`, which aborts
the entire directory traversal on the first unreadable entry
(permission denied, dangling symlink, file removed mid-scan). The
remaining directories collected so far are discarded and only the
search root gets globbed — on large trees (50k+ files) this silently
ignores almost every file. Replace it with a lazy `URL` enumerator
carrying a per-item error handler so an unreadable item skips only
itself instead of aborting the whole scan. The unused
`Glob.isDirectory(path:)` helper is removed.

After the enumeration, `Glob.resolveGlob` deduplicates results with
`Array.unique`, whose `Equatable` implementation is quadratic. On a
50k-file project that dedup effectively hangs. Add a `Hashable`
overload of `Array.unique` that runs in linear time; Swift's overload
resolution selects it automatically wherever the element type is
`Hashable` (every existing call site), so no other code changes.

Add `testGlobstarToleratesUnreadableSubdirectory` covering the
tolerance path; it is skipped under root because chmod-based
inaccessibility cannot be exercised when the test runs as root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Chupik Chupik force-pushed the fix/glob-expandglobstar-tolerance branch from 31e9646 to cc7ef45 Compare May 25, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants