fix(harvest-contributors): only credit PRs from the subject line, not body refs#37
Conversation
… body refs
harvest-contributors.sh grepped `#N` from the FULL multi-line commit
message and looked each number up as a PR in the current repo, crediting
its author. This false-credits anyone whose PR number happens to be
cross-referenced in a commit body:
- Dependency-bump commits (Renovate/Dependabot) embed upstream changelogs
full of foreign `#123` issue/PR refs AND the `​` HTML entity —
the latter is grepped as PR `#8203`.
- Body cross-references ("follow-up to #114") get looked up against THIS
repo's numbering and credit whoever owns that number.
Real example: harvesting netresearch/t3x-rte_ckeditor_image
v13.10.0...v13.10.1 credited `@eliasfernandez` (author of an unrelated
PR #114 referenced in a Renovate changelog) and tried PR numbers like
#8203/#5041 that don't exist.
Fix: extract the PR number only from the SUBJECT (first line) of each
commit and only from where GitHub stamps it — a squash-merge `(#N)`
suffix or a `Merge pull request #N` subject. Verified on the range above:
Code line is now `@CybotTM, @marekskopal` (correct), with no spurious
authors and no failed lookups.
Reporter detection is unchanged — it still relies on each detected PR's
`closingIssuesReferences` link (a PR that names an issue only in its
title, without a "Closes #N" link, was never auto-detected, before or
after this change).
Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
There was a problem hiding this comment.
Code Review
This pull request updates the harvest-contributors.sh script to extract PR numbers exclusively from the subject line of commits, targeting squash-merge suffixes or merge pull request subjects to prevent false positives from commit bodies. The review feedback recommends making the regular expression robust against trailing whitespace or carriage returns by allowing optional trailing whitespace before the end of the line.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…ression Add a no-network self-test for the PR-number extraction and bot filter, and a CI job to run it. - Refactor harvest-contributors.sh: hoist the pure helpers (extract_pr_numbers, is_bot) above a `main()` guarded by the BASH_SOURCE/$0 check, so tests can source the file and exercise the helpers without making gh API calls. No behavior change when executed. - scripts/tests/harvest-contributors.test.sh asserts that only squash "(#N)" suffixes and "Merge pull request #N" subjects are extracted, and that body cross-refs, dependency-bump changelog excerpts, and the "​" HTML entity yield nothing — the exact cases that previously false-credited @eliasfernandez (#114) and produced phantom PR #8203. - .github/workflows/script-tests.yml runs the self-test on push/PR (harden-runner + SHA-pinned actions, minimal permissions). Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
The `\(#[0-9]+\)$` anchor required the PR-number suffix at the exact end of the subject line, so a commit subject with trailing whitespace (space or tab) was silently skipped and its author dropped from the credit. Allow optional trailing whitespace: `\(#[0-9]+\)[[:space:]]*$`. Adds self-test cases for trailing space and tab. Addresses gemini-code-assist review feedback on this PR. Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
|



Problem
scripts/harvest-contributors.shgrepped#Nfrom the full multi-line commit message and looked each number up as a PR in the current repo, crediting its author. That false-credits anyone whose PR number is merely cross-referenced in a commit body:#123issue/PR refs — and the​HTML entity (zero-width space), which is grepped as PR#8203.Real example
Harvesting
netresearch/t3x-rte_ckeditor_imagev13.10.0...v13.10.1:#114is an unrelated old PR referenced inside a Renovate changelog;#8203is the​entity;#5041/#8203/… aren't PRs at all.Fix
Extract the PR number only from the subject (first line) of each commit, and only where GitHub stamps it — a squash-merge
(#N)suffix or aMerge pull request #Nsubject.Scope / non-regression
closingIssuesReferenceslink. A PR that names an issue only in its title (no "Closes #N" link) was never auto-detected, before or after.fix: … (#846) (#847)correctly resolve to the trailing PR#847; the in-title issue#846is no longer mis-looked-up.Found via a
/retroon the t3x-rte_ckeditor_image v13.10.1 release.https://claude.ai/code/session_015Myeo4imGJGskBMto9BVAm