Commit graph

1 commit

Author SHA1 Message Date
jedarden
ce2a77a879 feat(pdftract-1kdzu): implement TJ operator with kerning and word boundary detection
Implemented the TJ operator for PDF content stream processing:

- process_tj_array(): Parses TJ arrays (alternating strings and numeric kerning)
- apply_tj_kerning(): Applies kerning adjustments to text matrix and detects word boundaries
- GraphicsState::translate_text(): New method for horizontal text matrix translation

Key features:
- Kerning formula: -n/1000 * font_size * horiz_scaling/100
- Word boundary trigger: n > 200 (equivalent to n/1000 * font_size > 0.2 * font_size)
- Positive kerning injects synthetic word boundaries; negative kerning does not

Acceptance criteria (all PASS):
- [(Hello)250(World)] TJ → W has is_word_boundary=true
- [(kern)-10(ing)] TJ → i has is_word_boundary=false
- [(a)500(b)500(c)] TJ → both b and c carry is_word_boundary
- [] TJ → no glyphs (no-op)

13 new tests added; all TJ operator tests pass.

Closes: pdftract-1kdzu
2026-05-26 16:44:05 -04:00