Skip to content

Security (4/5): xref parsing hardening — CVE-2026-22691, CVE-2026-27628, CVE-2026-41168#4

Open
icanhasmath wants to merge 3 commits into
1.28.6.xfrom
1.28.6-sec-reader-xref
Open

Security (4/5): xref parsing hardening — CVE-2026-22691, CVE-2026-27628, CVE-2026-41168#4
icanhasmath wants to merge 3 commits into
1.28.6.xfrom
1.28.6-sec-reader-xref

Conversation

@icanhasmath

Copy link
Copy Markdown
Collaborator

Part 4 of 5 of the PyPDF2 1.28.6 security backport, targeting 1.28.6.x. Scoped to PyPDF2/_reader.py cross-reference parsing.

CVE Sev Fix
CVE-2026-22691 Low Replace ReDoS-prone xref-rebuild regex with linear scan
CVE-2026-27628 Low Detect circular xref /Prev chains
CVE-2026-41168 Mod Clamp object-stream /N and xref /Index//Size

Backported from upstream pypdf 6.6.0 / 6.7.2 / 6.10.1; Py2.7-safe. New tests: Tests/test_security_xref.py (5, incl. a real cyclic-/Prev PDF). Validated under Python 2.7.18 — test_basic_features.py + new tests pass, no regressions.

🤖 Generated with Claude Code

icanhasmath and others added 3 commits June 18, 2026 11:34
When startxref is broken (non-strict mode), _rebuild_xref_table scanned
the whole file with a regex
(b"[\r\n \t][ \t]*(\d+)[ \t]+(\d+)[ \t]+obj") that backtracks
catastrophically on input containing long whitespace runs, exhausting
CPU (CWE-1333 / CWE-400).

Replace it with a manual byte scanner (_find_pdf_objects) that finds
"<id> <gen> obj" markers in linear time using 1-byte slices (Py2/Py3
safe). Drop the now-unused `import re`. Mirrors upstream pypdf 6.6.0
(PR py-pdf#3594).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A malformed PDF whose cross-reference tables link /Prev in a cycle (xref
A -> /Prev -> B -> /Prev -> A) made the xref-reading loop in read()
follow /Prev forever, re-parsing the same tables (CWE-835).

Track visited startxref offsets in a set and break (with a warning) if an
offset repeats. Mirrors upstream pypdf 6.7.2 (PR py-pdf#3655).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A crafted /ObjStm with an enormous /N, or an xref stream with an
oversized /Index (or /Size) subsection count, could force excessive read
iterations (CWE-834).

- _get_object_from_stream: clamp /N to (len(stream) // 3 + 1), the most
  "objnum offset" pairs the data can hold (raises in strict mode, warns
  and limits otherwise). Defensive: 1.28.6 also self-limits via the EOF
  parse error, but this matches upstream and bounds the loop directly.
- _read_pdf15_xref_stream: clamp each /Index subsection count so the total
  cannot exceed (len(stream) // min_entry_bytes + 1) -- this is the
  substantive guard for the xref-stream iteration DoS.

Mirrors upstream pypdf 6.10.1 (PR py-pdf#3733). Coverage note: the /N clamp's
no-regression behaviour is unit-tested; a full binary xref-stream trigger
for the /Index path is not constructed in tests (verified by inspection).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant