pdftract/docs
jedarden 4e72c66763 Add research: Indic scripts, adversarial parser security
Two new research documents covering Indic script extraction (abugida
structure, ToUnicode CMap failures for shaped glyphs, ActualText
fast-path, GSUB lookup reversal, pre-base matra reordering, virama
placement, Tesseract fallback with script-specific models) and
adversarial input handling (decompression bombs, circular references,
malformed stream lengths, path traversal in attachments, content stream
loop detection, O(n log n) algorithm requirements, output sanitization).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:18:03 -04:00
..
notes Add SDK architecture notes covering top 10 languages 2026-05-16 14:51:25 -04:00
plan Add research: span merging, Unicode normalization, implementation plan 2026-05-16 16:15:14 -04:00
research Add research: Indic scripts, adversarial parser security 2026-05-16 16:18:03 -04:00