Commit graph

1 commit

Author SHA1 Message Date
jedarden
508ca5d0bb feat(pdftract-fy89c): implement line-to-block heuristic detector with 5 ordered triggers
Implement Phase 4.4 block formation with 5 ordered heuristics for grouping
lines into semantic blocks (paragraphs, headings, etc.):

1. Vertical gap > 1.5 * line_height → new block
2. Indent change > 0.03 * column_width → new block
3. Font size change > 1pt → new block
4. Rendering mode change → new block
5. Column boundary → MANDATORY block break

Changes:
- Extended Line<S> with median_font_size, rendering_mode, column fields
- Added LineMetadata trait for abstracting line representations
- Added Block<S> and BlockInput<L> structs for block representation
- Implemented group_lines_into_blocks() with column-aware sorting

All acceptance criteria tests pass (21/21).

Closes: pdftract-fy89c
2026-05-24 06:14:43 -04:00