pdftract/crates/pdftract-core/src
jedarden 508ca5d0bb feat(pdftract-fy89c): implement line-to-block heuristic detector with 5 ordered triggers
Implement Phase 4.4 block formation with 5 ordered heuristics for grouping
lines into semantic blocks (paragraphs, headings, etc.):

1. Vertical gap > 1.5 * line_height → new block
2. Indent change > 0.03 * column_width → new block
3. Font size change > 1pt → new block
4. Rendering mode change → new block
5. Column boundary → MANDATORY block break

Changes:
- Extended Line<S> with median_font_size, rendering_mode, column fields
- Added LineMetadata trait for abstracting line representations
- Added Block<S> and BlockInput<L> structs for block representation
- Implemented group_lines_into_blocks() with column-aware sorting

All acceptance criteria tests pass (21/21).

Closes: pdftract-fy89c
2026-05-24 06:14:43 -04:00
..
attachment feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
cache feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
fingerprint feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
font feat(pdftract-h2s0z): implement adaptive word boundary detector 2026-05-24 06:06:56 -04:00
forms feat(pdftract-5w6i): implement AcroForm field walker with recursive walk and dot-joined names 2026-05-24 05:31:51 -04:00
layout feat(pdftract-fy89c): implement line-to-block heuristic detector with 5 ordered triggers 2026-05-24 06:14:43 -04:00
parser feat(pdftract-lhq9t): implement ASCIIHexDecode filter improvements 2026-05-24 05:03:35 -04:00
profiles feat(pdftract-kdp6): implement profile loader secret key hardening 2026-05-24 04:41:04 -04:00
receipts feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
render feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
schema feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
signature feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
table feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
classify.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
content_stream.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
diagnostics.rs feat(pdftract-kdp6): implement profile loader secret key hardening 2026-05-24 04:41:04 -04:00
document.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
dpi.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
extract.rs feat(pdftract-j6yd): implement signatures array output + validation_status enum + schema integration 2026-05-24 04:05:34 -04:00
graphics_state.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
hybrid.rs feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
lib.rs feat(pdftract-h2s0z): implement adaptive word boundary detector 2026-05-24 06:06:56 -04:00
markdown.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
ocr.rs feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
options.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
preprocess.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
render.rs feat(pdftract-p7yll): implement cm operator diagnostics 2026-05-24 04:13:16 -04:00
semaphore.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
url_validation.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
word_boundary.rs feat(pdftract-h2s0z): implement adaptive word boundary detector 2026-05-24 06:06:56 -04:00