pdftract/docs
jedarden 516ca154aa Add research: page labels, government forms, book publishing, filter decoding
Four new extraction research documents covering page label/PageLabels
number tree and outline/bookmark tree extraction, government form PDF
patterns (IRS, USCIS, court filings, classification markings), book and
publishing PDF structure (running heads, footnotes, index extraction),
and PDF stream filter pipeline (FlateDecode/LZW predictors, JBIG2 global
segments, CCITTFax, JPX, error boundaries).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:55:08 -04:00
..
notes Add SDK architecture notes covering top 10 languages 2026-05-16 14:51:25 -04:00
plan Initial repo scaffold with README and docs structure 2026-05-16 14:26:16 -04:00
research Add research: page labels, government forms, book publishing, filter decoding 2026-05-16 15:55:08 -04:00