pdftract

History

jedarden 516ca154aa Add research: page labels, government forms, book publishing, filter decoding Four new extraction research documents covering page label/PageLabels number tree and outline/bookmark tree extraction, government form PDF patterns (IRS, USCIS, court filings, classification markings), book and publishing PDF structure (running heads, footnotes, index extraction), and PDF stream filter pipeline (FlateDecode/LZW predictors, JBIG2 global segments, CCITTFax, JPX, error boundaries). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-05-16 15:55:08 -04:00
..
notes	Add SDK architecture notes covering top 10 languages	2026-05-16 14:51:25 -04:00
plan	Initial repo scaffold with README and docs structure	2026-05-16 14:26:16 -04:00
research	Add research: page labels, government forms, book publishing, filter decoding	2026-05-16 15:55:08 -04:00