pdftract/crates
jedarden 508ca5d0bb feat(pdftract-fy89c): implement line-to-block heuristic detector with 5 ordered triggers
Implement Phase 4.4 block formation with 5 ordered heuristics for grouping
lines into semantic blocks (paragraphs, headings, etc.):

1. Vertical gap > 1.5 * line_height → new block
2. Indent change > 0.03 * column_width → new block
3. Font size change > 1pt → new block
4. Rendering mode change → new block
5. Column boundary → MANDATORY block break

Changes:
- Extended Line<S> with median_font_size, rendering_mode, column fields
- Added LineMetadata trait for abstracting line representations
- Added Block<S> and BlockInput<L> structs for block representation
- Implemented group_lines_into_blocks() with column-aware sorting

All acceptance criteria tests pass (21/21).

Closes: pdftract-fy89c
2026-05-24 06:14:43 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-4xu46): implement grep subcommand structure with clap parsing 2026-05-24 05:49:15 -04:00
pdftract-core feat(pdftract-fy89c): implement line-to-block heuristic detector with 5 ordered triggers 2026-05-24 06:14:43 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00