pdftract

History

jedarden 597f536b19 feat(pdftract-xzfkt): implement caption block classifier Add Phase 4 caption classification for detecting figure captions. Implements classify_caption() which identifies blocks as captions when: - Small font size (median < page body median) - Follows Figure block within 2 line heights - Same column as Figure Module: crates/pdftract-core/src/layout/caption.rs Acceptance criteria: - Block immediately below Figure, small font, same column → kind: Caption - Block 5 lines below Figure → NOT Caption (gap too large) - Block with body-size font below Figure → NOT Caption (font not smaller) - Block in different column from Figure → NOT Caption Tests: 9/9 passed covering all acceptance criteria plus edge cases. Closes: pdftract-xzfkt Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-24 01:56:34 -04:00
..
pdftract-cer-diff	docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files	2026-05-23 10:36:28 -04:00
pdftract-cli	feat(pdftract-3zhf): add unified TableDetector::detect entry point	2026-05-24 00:51:59 -04:00
pdftract-core	feat(pdftract-xzfkt): implement caption block classifier	2026-05-24 01:56:34 -04:00
pdftract-libpdftract	feat(pdftract-juc): implement Standard 14 font metrics registry	2026-05-23 14:04:02 -04:00
pdftract-py	docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files	2026-05-23 10:36:28 -04:00