..
.gitkeep
Initial repo scaffold with README and docs structure
2026-05-16 14:26:16 -04:00
benchmark-and-test-methodology.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
chunking-for-llm-consumption.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
cjk-and-asian-script-encoding.md
Add three research documents: CJK encoding, pipeline synthesis, linearization
2026-05-16 15:26:36 -04:00
cmap-format-and-cid-encoding.md
Add three research documents on parser correctness fundamentals
2026-05-16 15:16:41 -04:00
complex-layout-reading-order.md
Add four research documents focused on readable text production
2026-05-16 15:13:10 -04:00
confidence-scoring-and-aggregation.md
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
2026-05-16 15:35:48 -04:00
content-stream-concatenation.md
Add three research documents on parser correctness fundamentals
2026-05-16 15:16:41 -04:00
document-classification-and-zone-labeling.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
embedded-files-and-portfolios.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
engineering-document-extraction.md
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
2026-05-16 15:35:48 -04:00
extraction-pipeline-overview.md
Add three research documents: CJK encoding, pipeline synthesis, linearization
2026-05-16 15:26:36 -04:00
font-subsetting-and-extraction.md
Add research: font subsetting, LaTeX patterns, redaction detection
2026-05-16 15:30:52 -04:00
form-fields-and-annotations.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
glyph-recognition-and-unicode-recovery.md
Add research docs and SDK invocation notes
2026-05-16 14:33:34 -04:00
graphics-state-tracking.md
Add three research documents on parser correctness fundamentals
2026-05-16 15:16:41 -04:00
historical-and-degraded-document-extraction.md
Add four research documents focused on readable text production
2026-05-16 15:13:10 -04:00
image-and-figure-extraction.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
invisible-and-hidden-text.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
language-detection-and-script-handling.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
latex-and-scientific-pdf-patterns.md
Add research: font subsetting, LaTeX patterns, redaction detection
2026-05-16 15:30:52 -04:00
legal-and-financial-pdf-patterns.md
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
2026-05-16 15:35:48 -04:00
linearized-pdf-and-streaming.md
Add three research documents: CJK encoding, pipeline synthesis, linearization
2026-05-16 15:26:36 -04:00
malformed-pdf-repair-and-recovery.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
mathematical-expression-handling.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
optional-content-groups.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
page-geometry-and-document-structure.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
pdf-encryption-and-security.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
pdf-fonts-and-encoding.md
Add research docs and SDK invocation notes
2026-05-16 14:33:34 -04:00
pdf-specification.md
Add research docs and SDK invocation notes
2026-05-16 14:33:34 -04:00
pdfa-compliance-and-extraction.md
Add three research documents on routing and text reconstruction
2026-05-16 15:22:08 -04:00
performance-and-streaming-architecture.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
post-extraction-normalization.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
post-ocr-text-correction.md
Add four research documents on text quality and document-type handling
2026-05-16 15:07:30 -04:00
presentation-and-spreadsheet-pdfs.md
Add four research documents on text quality and document-type handling
2026-05-16 15:07:30 -04:00
raster-ocr-pipeline.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
redaction-detection-and-recovery.md
Add research: font subsetting, LaTeX patterns, redaction detection
2026-05-16 15:30:52 -04:00
scanned-vs-vector-page-classification.md
Add three research documents on routing and text reconstruction
2026-05-16 15:22:08 -04:00
semantic-text-reconstruction.md
Add four research documents on text quality and document-type handling
2026-05-16 15:07:30 -04:00
stroke-and-outlined-text.md
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
2026-05-16 15:35:48 -04:00
table-structure-reconstruction.md
Add six research documents covering output-side extraction topics
2026-05-16 14:56:25 -04:00
tagged-pdf-structure-and-reading-order.md
Add research docs and SDK invocation notes
2026-05-16 14:33:34 -04:00
text-readability-validation.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00
type3-font-extraction.md
Add four research documents focused on readable text production
2026-05-16 15:13:10 -04:00
watermark-and-background-separation.md
Add four research documents focused on readable text production
2026-05-16 15:13:10 -04:00
word-boundary-reconstruction.md
Add three research documents on routing and text reconstruction
2026-05-16 15:22:08 -04:00
xmp-and-document-metadata.md
Add 12 research documents covering full PDF extraction surface
2026-05-16 15:05:42 -04:00