Calibrate percentile sliders on positives only; drop slider poly-line by DimaMolod · Pull Request #15 · KosinskiLab/AlphaJudge

DimaMolod · 2026-06-10T07:23:37Z

Summary

Three corrections to the AlphaJudge interface metascore / percentile slider report, rebased onto main after the 1.1.0 release (which shipped the validation-report + meta-score feature). The diff is just these changes on top of current main — no paper/benchmark artifacts.

1. Percentile sliders calibrated on POSITIVES ONLY

BENCHMARK_QUANTILES (meta_score.py) was computed over the entire benchmark — 7,756 rows, half non-interacting database-negatives — which inflates the percentile of a mediocre prediction. Recomputed over the 3,878 positive (interacting) AF2/AF3 rows only, so a prediction is ranked against the distribution of real interfaces.

metric	value	old (all rows)	new (positives)
`interface_ipSAE`	0.50	74.5th	48.9th
`interface_LIS`	0.30	72.2nd	46.3rd

Per-feature AUROC unchanged (monotonic transform); production interface_meta_score AUROC on the balanced benchmark is 0.878 (AP 0.910).

2. Removed the black poly-line connecting the sliders

Each metric still shows its black percentile marker; the connecting line is gone.

3. Meta marker consistency (addresses earlier Codex P2)

_row_meta_score now recomputes from the current calibration first, falling back to a precomputed interface_meta_score column only when the raw feature columns are absent — so the Meta marker can't silently use a stale all-rows score while the sliders use the new one.

Reproducibility

scripts/freeze_metascore_quantiles.py regenerates the deciles from any benchmark CSV (--label-filter positive|negative|all; all reproduces the prior scale bit-for-bit). Uses csv + numpy only — no pandas (not an AlphaJudge dependency; addresses the other earlier Codex P2).

Report wording updated throughout to state the scale is built from interacting (positive) pairs.

Test plan

pytest test/test_meta_score.py test/test_report.py → 9/9 pass.
py_compile clean; freeze script runs in a stock env and its output is bit-identical to meta_score.py.
Rendered the slider panel: poly-line absent, markers present, legend reads "Percentile vs interacting (positive) benchmark pairs".

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 002228d355

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-10T07:27:47Z

+    direct = _safe_float(row.get("interface_meta_score"))
+    if direct is not None:
+        return direct


Recompute the meta score with the new calibration

When the input CSV already contains an interface_meta_score column, this returns that value before recalculating from the new positives-only quantiles. That means aggregate/per-run reports generated from older or externally merged CSVs can still rank interfaces and draw the Meta score marker using the legacy all-rows calibration while the feature sliders use the new calibration, contradicting the report text that all percentiles are against interacting positive pairs. Prefer recomputing here (or validating the calibration version) so stale scores do not silently survive the recalibration.

Useful? React with 👍 / 👎.

Fixed in dbb818b. _row_meta_score now recomputes the meta score from the current positives-only calibration first, so the Meta marker is consistent with the recalibrated feature sliders. A precomputed interface_meta_score column (which may carry the legacy all-rows calibration on older/merged CSVs) is only used as a fallback when the raw feature columns are absent.

chatgpt-codex-connector · 2026-06-10T07:27:47Z

+import numpy as np
+import pandas as pd


Declare pandas for the quantile freezer

This new reproducibility script imports pandas, but the project dependencies in pyproject.toml do not declare pandas. In a normal AlphaJudge install that only has the packaged dependencies, running the advertised python scripts/freeze_metascore_quantiles.py ... path will fail immediately with ModuleNotFoundError before it can regenerate the frozen deciles; either add pandas as a dependency/extra or avoid requiring it here.

Useful? React with 👍 / 👎.

Fixed in dbb818b. The script was rewritten to use csv + numpy only (no pandas), matching the existing analyze_interface_meta_score.py convention, so the advertised command runs in a stock AlphaJudge install. Output is bit-identical to the pandas version (verified < 1e-12).

Rebased onto main 1.1.0 (which shipped the validation-report/meta-score feature). Applies three corrections on top of the current report: - meta_score.py: BENCHMARK_QUANTILES recomputed on POSITIVE (interacting) benchmark pairs only (3,878 AF2/AF3 positive rows) instead of all 7,756 rows (half non-interacting database-negatives). A prediction is now ranked against the distribution of real interfaces, not a decoy-padded population: e.g. interface_ipSAE=0.5 maps to the 49th percentile (was 75th); interface_LIS=0.30 to 46th (was 72nd). Per-feature AUROC is unchanged (monotonic transform); production interface_meta_score AUROC on the balanced benchmark is 0.878. - report.py: removed the black poly-line connecting per-group slider markers (each metric still shows its black percentile marker). _row_meta_score now recomputes the meta score from the current calibration first, falling back to a precomputed interface_meta_score column only when raw features are absent, so the Meta marker stays consistent with the recalibrated sliders. Wording updated to state the scale is built from interacting (positive) pairs. - scripts/freeze_metascore_quantiles.py: reproduces the deciles from any benchmark CSV (--label-filter positive|negative|all; "all" reproduces the prior all-rows scale bit-for-bit). Uses csv+numpy only (no pandas, which is not an AlphaJudge dependency). Tests: test_meta_score.py + test_report.py pass (9/9). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

DimaMolod force-pushed the percentile-positives-only branch from dbb818b to db8697f Compare June 10, 2026 07:44

DimaMolod changed the title ~~Percentile sliders: calibrate on positives only; drop slider poly-line~~ Calibrate percentile sliders on positives only; drop slider poly-line Jun 10, 2026

DimaMolod merged commit 45dc059 into main Jun 10, 2026
8 checks passed

DimaMolod deleted the percentile-positives-only branch June 10, 2026 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calibrate percentile sliders on positives only; drop slider poly-line#15

Calibrate percentile sliders on positives only; drop slider poly-line#15
DimaMolod merged 1 commit into
mainfrom
percentile-positives-only

DimaMolod commented Jun 10, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

DimaMolod Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

DimaMolod Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		import numpy as np
		import pandas as pd

Uh oh!

Conversation

DimaMolod commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Percentile sliders calibrated on POSITIVES ONLY

2. Removed the black poly-line connecting the sliders

3. Meta marker consistency (addresses earlier Codex P2)

Reproducibility

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

DimaMolod Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

DimaMolod Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DimaMolod commented Jun 10, 2026 •

edited

Loading