pdftract/crates
jedarden 98964e06fe fix(pdftract-2j4zl): fix header/footer duplicate counting bug
The detect_headers_and_footers function was incrementing classified_count
every time a block was classified, even if it was already classified from
a previous sliding window iteration. With 10 pages and identical headers,
blocks on pages 1-9 would be reclassified multiple times (31 classifications
instead of 10).

Fixed by checking if block is already "header" or "footer" before incrementing
the counter.

All 25 header_footer tests now pass.

Refs: pdftract-2j4zl

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 00:04:13 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-2825c): add comparison mode support to inspector frontend 2026-05-27 22:52:15 -04:00
pdftract-core fix(pdftract-2j4zl): fix header/footer duplicate counting bug 2026-05-28 00:04:13 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-1tswa): implement GIL release with py.allow_threads on extraction entry points 2026-05-26 21:23:00 -04:00