Skip to content

fix(harvest-contributors): only credit PRs from the subject line, not body refs#37

Merged
CybotTM merged 3 commits into
mainfrom
fix/harvest-contributors-pr-extraction
Jun 19, 2026
Merged

fix(harvest-contributors): only credit PRs from the subject line, not body refs#37
CybotTM merged 3 commits into
mainfrom
fix/harvest-contributors-pr-extraction

Conversation

@CybotTM

@CybotTM CybotTM commented Jun 19, 2026

Copy link
Copy Markdown
Member

Problem

scripts/harvest-contributors.sh grepped #N from the full multi-line commit message and looked each number up as a PR in the current repo, crediting its author. That false-credits anyone whose PR number is merely cross-referenced in a commit body:

  • Dependency-bump commits (Renovate/Dependabot) embed upstream changelogs full of foreign #123 issue/PR refs — and the ​ HTML entity (zero-width space), which is grepped as PR #8203.
  • Body cross-references ("follow-up to #114") get looked up against this repo's numbering and credit whoever owns that number.

Real example

Harvesting netresearch/t3x-rte_ckeditor_image v13.10.0...v13.10.1:

# before (buggy): #N from full messages
114 842 843 844 845 849 850 851 852 853 854 855 856 5041 5097 5099 5166 5168 5423 5430 8203
→ Code: @CybotTM, @eliasfernandez, @marekskopal   ← @eliasfernandez is WRONG (owns unrelated #114)

#114 is an unrelated old PR referenced inside a Renovate changelog; #8203 is the ​ entity; #5041/#8203/… aren't PRs at all.

Fix

Extract the PR number only from the subject (first line) of each commit, and only where GitHub stamps it — a squash-merge (#N) suffix or a Merge pull request #N subject.

# after (fixed): subject-line (#N) only
842 843 844 845 849 850 851 852 853 854 855 856
→ Code: @CybotTM, @marekskopal   ← correct

Scope / non-regression

  • Reporter detection unchanged — still driven by each detected PR's closingIssuesReferences link. A PR that names an issue only in its title (no "Closes #N" link) was never auto-detected, before or after.
  • Subjects with two refs like fix: … (#846) (#847) correctly resolve to the trailing PR #847; the in-title issue #846 is no longer mis-looked-up.

Found via a /retro on the t3x-rte_ckeditor_image v13.10.1 release.

https://claude.ai/code/session_015Myeo4imGJGskBMto9BVAm

… body refs

harvest-contributors.sh grepped `#N` from the FULL multi-line commit
message and looked each number up as a PR in the current repo, crediting
its author. This false-credits anyone whose PR number happens to be
cross-referenced in a commit body:

- Dependency-bump commits (Renovate/Dependabot) embed upstream changelogs
  full of foreign `#123` issue/PR refs AND the `​` HTML entity —
  the latter is grepped as PR `#8203`.
- Body cross-references ("follow-up to #114") get looked up against THIS
  repo's numbering and credit whoever owns that number.

Real example: harvesting netresearch/t3x-rte_ckeditor_image
v13.10.0...v13.10.1 credited `@eliasfernandez` (author of an unrelated
PR #114 referenced in a Renovate changelog) and tried PR numbers like
#8203/#5041 that don't exist.

Fix: extract the PR number only from the SUBJECT (first line) of each
commit and only from where GitHub stamps it — a squash-merge `(#N)`
suffix or a `Merge pull request #N` subject. Verified on the range above:
Code line is now `@CybotTM, @marekskopal` (correct), with no spurious
authors and no failed lookups.

Reporter detection is unchanged — it still relies on each detected PR's
`closingIssuesReferences` link (a PR that names an issue only in its
title, without a "Closes #N" link, was never auto-detected, before or
after this change).

Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
Copilot AI review requested due to automatic review settings June 19, 2026 11:18
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/checkout df4cb1c069e1874edd31b4311f1884172cec0e10 🟢 5.9
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Binary-Artifacts🟢 10no binaries found in the repo
Maintained⚠️ 23 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 2
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies🟢 3dependency not pinned by hash detected -- score normalized to 3
License🟢 10license file detected
Packaging⚠️ -1packaging workflow not detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
SAST🟢 8SAST tool detected but not run on all commits
Branch-Protection🟢 5branch protection is not maximal on development and all release branches
actions/step-security/harden-runner 9af89fc71515a100421586dfdb3dc9c984fbf411 🟢 8
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
CI-Tests🟢 1016 out of 16 merged PRs checked by a CI test -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Code-Review🟢 10all changesets reviewed
Contributors🟢 6project has 2 contributing companies or organizations -- score normalized to 6
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Dependency-Update-Tool🟢 10update tool detected
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Maintained🟢 1015 commit(s) and 5 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Pinned-Dependencies🟢 6dependency not pinned by hash detected -- score normalized to 6
SAST🟢 10SAST tool is run on all commits
Security-Policy🟢 10security policy file detected
Signed-Releases⚠️ -1no releases found
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Vulnerabilities⚠️ 19 existing vulnerabilities detected

Scanned Files

  • .github/workflows/script-tests.yml

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the harvest-contributors.sh script to extract PR numbers exclusively from the subject line of commits, targeting squash-merge suffixes or merge pull request subjects to prevent false positives from commit bodies. The review feedback recommends making the regular expression robust against trailing whitespace or carriage returns by allowing optional trailing whitespace before the end of the line.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread skills/github-release/scripts/harvest-contributors.sh Outdated
…ression

Add a no-network self-test for the PR-number extraction and bot filter,
and a CI job to run it.

- Refactor harvest-contributors.sh: hoist the pure helpers
  (extract_pr_numbers, is_bot) above a `main()` guarded by the
  BASH_SOURCE/$0 check, so tests can source the file and exercise the
  helpers without making gh API calls. No behavior change when executed.
- scripts/tests/harvest-contributors.test.sh asserts that only squash
  "(#N)" suffixes and "Merge pull request #N" subjects are extracted, and
  that body cross-refs, dependency-bump changelog excerpts, and the
  "&#8203;" HTML entity yield nothing — the exact cases that previously
  false-credited @eliasfernandez (#114) and produced phantom PR #8203.
- .github/workflows/script-tests.yml runs the self-test on push/PR
  (harden-runner + SHA-pinned actions, minimal permissions).

Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
@github-actions github-actions Bot added the ci label Jun 19, 2026
The `\(#[0-9]+\)$` anchor required the PR-number suffix at the exact end
of the subject line, so a commit subject with trailing whitespace (space
or tab) was silently skipped and its author dropped from the credit.
Allow optional trailing whitespace: `\(#[0-9]+\)[[:space:]]*$`. Adds
self-test cases for trailing space and tab.

Addresses gemini-code-assist review feedback on this PR.

Signed-off-by: Sebastian Mendel <github@sebastianmendel.de>
@CybotTM CybotTM merged commit a0a965d into main Jun 19, 2026
21 checks passed
@CybotTM CybotTM deleted the fix/harvest-contributors-pr-extraction branch June 19, 2026 13:05
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants