Tolerate unreadable directories and dedupe globs in linear time#6663
Open
Chupik wants to merge 1 commit into
Open
Tolerate unreadable directories and dedupe globs in linear time#6663Chupik wants to merge 1 commit into
Chupik wants to merge 1 commit into
Conversation
Generated by 🚫 Danger |
`Glob.expandGlobstar` used `subpathsOfDirectory(atPath:)`, which aborts the entire directory traversal on the first unreadable entry (permission denied, dangling symlink, file removed mid-scan). The remaining directories collected so far are discarded and only the search root gets globbed — on large trees (50k+ files) this silently ignores almost every file. Replace it with a lazy `URL` enumerator carrying a per-item error handler so an unreadable item skips only itself instead of aborting the whole scan. The unused `Glob.isDirectory(path:)` helper is removed. After the enumeration, `Glob.resolveGlob` deduplicates results with `Array.unique`, whose `Equatable` implementation is quadratic. On a 50k-file project that dedup effectively hangs. Add a `Hashable` overload of `Array.unique` that runs in linear time; Swift's overload resolution selects it automatically wherever the element type is `Hashable` (every existing call site), so no other code changes. Add `testGlobstarToleratesUnreadableSubdirectory` covering the tolerance path; it is skipped under root because chmod-based inaccessibility cannot be exercised when the test runs as root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31e9646 to
cc7ef45
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduction
We noticed that SwiftLint sometimes has problems with our very large repository (50,000+ files). The problem is that SwiftLint occasionally fails to enumerate the files to lint. We finally reproduced the problem and used an LLM to investigate and fix it. As a side benefit, the LLM also identified a way to significantly improve linting performance - see the Performance section below. The explanation below was drafted with LLM assistance.
Problem
Glob.expandGlobstarusessubpathsOfDirectory(atPath:)to discover nested directories before per-directory globbing. That API is all-or-nothing: it throws on the first item it cannot access, and the catch block discards every directory collected so far. After the throw, only the search root remains indirectories, so the rest ofexpandGlobstarglobs nothing more than the root pattern — every nested file is silently ignored.Then in
Glob.resolveGlob, matched paths are deduped withArray.unique, whoseEquatableoverload is O(n²) (Array.containslinear scan per element). On a 50k-file project that's ~2.5 billion string comparisons; in practice it effectively hangs.So a large project with a
**glob and any unreadable subdirectory (permission denied, dangling symlink, file removed mid-scan in CI) is hit by both bugs: most files are dropped, and the few that remain take very long to dedupe.Fix
subpathsOfDirectory(atPath:)with a lazyFileManager.enumerator(at:includingPropertiesForKeys:options:errorHandler:). TheerrorHandlerreturnstrue, so a single unreadable entry skips only itself rather than aborting the whole scan. Removed the now-unusedGlob.isDirectory(path:)helper since.isDirectoryKeyreplaces its job (and is pre-cached by the enumerator).Hashableoverload ofArray.unique(linear time viaSet.insert). Swift's overload resolution selects it automatically wherever the element type isHashable, so every existing call site gets the speed-up with no other code changes — including the call insideGlob.resolveGlobthat motivated this.Tests
testGlobstarToleratesUnreadableSubdirectoryinTests/FileSystemAccessTests/GlobTests: creates a temp tree, chmods one subdirectory000, runsGlob.resolveGlob("**/*.swift"), asserts that siblings of the unreadable subtree still resolve while the subtree itself stays inaccessible. Skipped under root since permission-bit tests can't exercise the tolerance fix as root.Performance
Measured on an internal iOS monorepo (58 133
.swiftfiles; 43 643 of them lintable afterexcluded:filtering) usingswiftlint lint --no-cache --quiet --only-rule hardcoded_localizable_string --write-baseline …. Runs alternated between the two binaries to even out OS page-cache effects.realm/SwiftLint0.59.1Baseline output is byte-identical between the two binaries, so the speedup is on the same workload (not a case of the new build doing less work):
Notes for review
subpathsOfDirectorydid. For glob expansion this is generally desirable — no symlink-cycle hangs, no double-linting — and matches the direction this codebase has already taken inFileManager.collectFiles. Happy to change if you'd rather preserve the old following behavior.Array.uniquepublic surface: this adds a new computed property onArray where Element: Hashablealongside the existingArray where Element: Equatable. Overload resolution picks the Hashable one transparently, so no source breakage. If you'd prefer to scope the fix to a privateSet-based dedup inGlob.swiftand leaveArray.uniqueuntouched, that's a one-line alternative — let me know.