Commit graph

1 commit

Author SHA1 Message Date
jedarden
e3b72efc83 Add research: Southeast Asian scripts, OpenType MATH formula extraction
Two new research documents covering Southeast Asian script extraction
(Thai/Khmer/Myanmar/Lao/Tibetan/Ethiopic — cluster structure, no-space
word boundary policy for Thai/Lao, Zawgyi vs Unicode detection for
Myanmar, USE shaping, Tesseract fallback) and OpenType MATH table
exploitation for formula extraction (MathConstants for fraction/
subscript/radical layout, TeX OML/OMS/OMX encoding tables, MathML
output generation, GlyphAssembly reconstruction, alternative text
and MathJax XMP source recovery).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:21:48 -04:00