Security (4/5): xref parsing hardening — CVE-2026-22691, CVE-2026-27628, CVE-2026-41168 by icanhasmath · Pull Request #4 · ActiveState/pypdf

icanhasmath · 2026-06-18T18:40:20Z

Part 4 of 5 of the PyPDF2 1.28.6 security backport, targeting 1.28.6.x. Scoped to PyPDF2/_reader.py cross-reference parsing.

CVE	Sev	Fix
CVE-2026-22691	Low	Replace ReDoS-prone xref-rebuild regex with linear scan
CVE-2026-27628	Low	Detect circular xref `/Prev` chains
CVE-2026-41168	Mod	Clamp object-stream `/N` and xref `/Index`/`/Size`

Backported from upstream pypdf 6.6.0 / 6.7.2 / 6.10.1; Py2.7-safe. New tests: Tests/test_security_xref.py (5, incl. a real cyclic-/Prev PDF). Validated under Python 2.7.18 — test_basic_features.py + new tests pass, no regressions.

🤖 Generated with Claude Code

When startxref is broken (non-strict mode), _rebuild_xref_table scanned the whole file with a regex (b"[\r\n \t][ \t]*(\d+)[ \t]+(\d+)[ \t]+obj") that backtracks catastrophically on input containing long whitespace runs, exhausting CPU (CWE-1333 / CWE-400). Replace it with a manual byte scanner (_find_pdf_objects) that finds "<id> <gen> obj" markers in linear time using 1-byte slices (Py2/Py3 safe). Drop the now-unused `import re`. Mirrors upstream pypdf 6.6.0 (PR py-pdf#3594). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A malformed PDF whose cross-reference tables link /Prev in a cycle (xref A -> /Prev -> B -> /Prev -> A) made the xref-reading loop in read() follow /Prev forever, re-parsing the same tables (CWE-835). Track visited startxref offsets in a set and break (with a warning) if an offset repeats. Mirrors upstream pypdf 6.7.2 (PR py-pdf#3655). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A crafted /ObjStm with an enormous /N, or an xref stream with an oversized /Index (or /Size) subsection count, could force excessive read iterations (CWE-834). - _get_object_from_stream: clamp /N to (len(stream) // 3 + 1), the most "objnum offset" pairs the data can hold (raises in strict mode, warns and limits otherwise). Defensive: 1.28.6 also self-limits via the EOF parse error, but this matches upstream and bounds the loop directly. - _read_pdf15_xref_stream: clamp each /Index subsection count so the total cannot exceed (len(stream) // min_entry_bytes + 1) -- this is the substantive guard for the xref-stream iteration DoS. Mirrors upstream pypdf 6.10.1 (PR py-pdf#3733). Coverage note: the /N clamp's no-regression behaviour is unit-tested; a full binary xref-stream trigger for the /Index path is not constructed in tests (verified by inspection). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

icanhasmath and others added 3 commits June 18, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security (4/5): xref parsing hardening — CVE-2026-22691, CVE-2026-27628, CVE-2026-41168#4

Security (4/5): xref parsing hardening — CVE-2026-22691, CVE-2026-27628, CVE-2026-41168#4
icanhasmath wants to merge 3 commits into
1.28.6.xfrom
1.28.6-sec-reader-xref

icanhasmath commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icanhasmath commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant