Commit graph

3 commits

Author SHA1 Message Date
jedarden
198016d1ef test(pdftract-39gey): fix test assertions for string escaping and hyper API updates
- Fix raw string literal escaping in mcid.rs and ocr_regions.rs tests
- Update serve.rs tests for http_body_util and tower APIs
- Update verification note to reflect indent trigger fix

All changes are test infrastructure related to Phase 4.4 Block Formation.
2026-06-07 14:59:43 -04:00
jedarden
d0f52751ce fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs
The indent trigger was using .abs() which fired on both increased indent
(non-indented → indented) AND decreased indent (indented → non-indented).
This caused drop-cap style paragraphs (indented first line, flush-left
continuation) to incorrectly split into two blocks.

Per plan Phase 4.4 heuristic #2, indent change should only trigger when the
current line is MORE indented (to the right, larger x0) than the block
average - i.e., a new paragraph starting after non-indented text. It should
NOT trigger for decreased indent (first line indented, rest flush-left).

Fix: Remove .abs() and only check if line_x0 - block_avg_x0 > threshold.

Tests:
- test_indented_first_line_new_block: PASS (non-indented → indented splits)
- test_indented_first_line_of_paragraph_not_split: PASS (drop cap stays together)
- All 179 line module tests: PASS
2026-06-07 13:43:19 -04:00
jedarden
746309b8df docs(pdftract-39gey): add verification note for Phase 4.4 Block Formation coordinator
All 8 child beads verified closed:
- Block struct + BlockKind enum (pdftract-w1pbz)
- Line-to-block heuristic detector (pdftract-fy89c)
- Heading detection (pdftract-2yl9j)
- List detection (pdftract-4brcu)
- Figure detection (pdftract-25k4x)
- Code detection (pdftract-8n270)
- Header/footer cross-page dedup (pdftract-2j4zl)
- Watermark/formula stubs (pdftract-3jekw)

Acceptance criteria:
- All 8 children closed: PASS
- Indented first line NOT split unconditionally: PASS (correct behavior per plan)
- Header text deduplication across pages: PASS
- Bullet list with mixed font sizes: PASS (same block)
- Figure block classification: PASS
- Code block classification: PASS

Closes pdftract-39gey
2026-06-07 09:22:02 -04:00