Commit graph

1 commit

Author SHA1 Message Date
jedarden
d3c4ecd268 feat(pdftract-8n270): implement code block detection
Implement Phase 4.4 code block classification for detecting indented
monospace code blocks.

Features:
- is_monospace_font_name: Check font name for monospace indicators
  (mono, courier, code, fixed, console - case-insensitive)
- is_fixed_pitch_flag: Check FontDescriptor bit 0 (FixedPitch)
- classify_code: Classify block as code if all spans monospace AND
  indented ≥ 2em from column baseline
- classify_page_code_blocks: Post-processing pass to upgrade paragraph
  blocks to code kind

Acceptance criteria:
- All-Courier, indented 24pt, font_size 12pt (2em=24): Code ✓
- All-monospace, not indented: NOT Code ✓
- Mixed serif+monospace: NOT Code ✓
- One serif span at end: NOT Code ✓
- FixedPitch flag set, no "Mono" in name: STILL Code ✓

Closes: pdftract-8n270

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 10:04:22 -04:00