pdftract/crates
jedarden d3c4ecd268 feat(pdftract-8n270): implement code block detection
Implement Phase 4.4 code block classification for detecting indented
monospace code blocks.

Features:
- is_monospace_font_name: Check font name for monospace indicators
  (mono, courier, code, fixed, console - case-insensitive)
- is_fixed_pitch_flag: Check FontDescriptor bit 0 (FixedPitch)
- classify_code: Classify block as code if all spans monospace AND
  indented ≥ 2em from column baseline
- classify_page_code_blocks: Post-processing pass to upgrade paragraph
  blocks to code kind

Acceptance criteria:
- All-Courier, indented 24pt, font_size 12pt (2em=24): Code ✓
- All-monospace, not indented: NOT Code ✓
- Mixed serif+monospace: NOT Code ✓
- One serif span at end: NOT Code ✓
- FixedPitch flag set, no "Mono" in name: STILL Code ✓

Closes: pdftract-8n270

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 10:04:22 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-1s2uj): add xref test fixture corpus and integration test runner 2026-05-24 08:20:04 -04:00
pdftract-core feat(pdftract-8n270): implement code block detection 2026-05-24 10:04:22 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00