Implemented the TJ operator for PDF content stream processing: - process_tj_array(): Parses TJ arrays (alternating strings and numeric kerning) - apply_tj_kerning(): Applies kerning adjustments to text matrix and detects word boundaries - GraphicsState::translate_text(): New method for horizontal text matrix translation Key features: - Kerning formula: -n/1000 * font_size * horiz_scaling/100 - Word boundary trigger: n > 200 (equivalent to n/1000 * font_size > 0.2 * font_size) - Positive kerning injects synthetic word boundaries; negative kerning does not Acceptance criteria (all PASS): - [(Hello)250(World)] TJ → W has is_word_boundary=true - [(kern)-10(ing)] TJ → i has is_word_boundary=false - [(a)500(b)500(c)] TJ → both b and c carry is_word_boundary - [] TJ → no glyphs (no-op) 13 new tests added; all TJ operator tests pass. Closes: pdftract-1kdzu |
||
|---|---|---|
| .. | ||
| pdftract-cer-diff | ||
| pdftract-cli | ||
| pdftract-core | ||
| pdftract-libpdftract | ||
| pdftract-py | ||