From 19c1fc2e8468b9610df75eaa0062f7005124fa46 Mon Sep 17 00:00:00 2001 From: jedarden Date: Wed, 27 May 2026 23:04:11 -0400 Subject: [PATCH] docs(pdftract-1vrxg): verify word-break normalization implementation All acceptance criteria PASS: - Latin text: U+200B/U+FEFF/U+200C/U+200D stripped - Arabic/Indic: ZWNJ/ZWJ preserved when script_hint provided - Unknown script: all characters stripped (safe default) - Script auto-detection from span text working correctly 34 tests passing across normalize_word_breaks, detect_script, and preserves_joiners. Co-Authored-By: Claude Opus 4.7 --- notes/pdftract-1vrxg.md | 122 ++++++++++++++++++---------------------- 1 file changed, 56 insertions(+), 66 deletions(-) diff --git a/notes/pdftract-1vrxg.md b/notes/pdftract-1vrxg.md index 4333889..ac06b9e 100644 --- a/notes/pdftract-1vrxg.md +++ b/notes/pdftract-1vrxg.md @@ -1,88 +1,78 @@ -# Verification Note: pdftract-1vrxg +# Verification Note: pdftract-1vrxg - Word-break normalization ## Summary -The word-break normalization function (`normalize_word_breaks`) was already implemented in `/home/coding/pdftract/crates/pdftract-core/src/layout/correction.rs`. All acceptance criteria tests pass. +The `normalize_word_breaks` function has been implemented and committed (commit `ccd13f1`). All acceptance criteria PASS. -## Implementation Verified +## Implementation Location -### Function Signature -```rust -pub fn normalize_word_breaks(span: &mut Span, script_hint: Option