| .. |
|
.gitkeep
|
Initial repo scaffold with README and docs structure
|
2026-05-16 14:26:16 -04:00 |
|
accessibility-and-tagged-pdf-deep-dive.md
|
Add research: portfolios, incremental updates, tagged PDF, JavaScript/forms
|
2026-05-16 15:45:59 -04:00 |
|
adversarial-inputs-and-parser-security.md
|
Add research: Indic scripts, adversarial parser security
|
2026-05-16 16:18:03 -04:00 |
|
article-threads-and-reading-order.md
|
Add research: article threads, resource dictionaries, catalog, hyperlinks
|
2026-05-16 16:04:00 -04:00 |
|
benchmark-and-test-methodology.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
book-and-publishing-pdf-patterns.md
|
Add research: page labels, government forms, book publishing, filter decoding
|
2026-05-16 15:55:08 -04:00 |
|
chunking-for-llm-consumption.md
|
Add six research documents covering output-side extraction topics
|
2026-05-16 14:56:25 -04:00 |
|
cjk-and-asian-script-encoding.md
|
Add three research documents: CJK encoding, pipeline synthesis, linearization
|
2026-05-16 15:26:36 -04:00 |
|
cmap-format-and-cid-encoding.md
|
Add three research documents on parser correctness fundamentals
|
2026-05-16 15:16:41 -04:00 |
|
color-management-and-icc-profiles.md
|
Add research: color management, text metrics, PDF/X, content stream operators
|
2026-05-16 15:59:02 -04:00 |
|
complex-layout-reading-order.md
|
Add four research documents focused on readable text production
|
2026-05-16 15:13:10 -04:00 |
|
confidence-scoring-and-aggregation.md
|
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
|
2026-05-16 15:35:48 -04:00 |
|
content-stream-concatenation.md
|
Add three research documents on parser correctness fundamentals
|
2026-05-16 15:16:41 -04:00 |
|
content-stream-operators.md
|
Add research: color management, text metrics, PDF/X, content stream operators
|
2026-05-16 15:59:02 -04:00 |
|
digital-signatures-and-certification.md
|
Add research: color visibility, medical/scientific, multilingual, digital signatures
|
2026-05-16 15:41:43 -04:00 |
|
document-catalog-and-structure.md
|
Add research: article threads, resource dictionaries, catalog, hyperlinks
|
2026-05-16 16:04:00 -04:00 |
|
document-classification-and-zone-labeling.md
|
Add six research documents covering output-side extraction topics
|
2026-05-16 14:56:25 -04:00 |
|
embedded-files-and-portfolios.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
engineering-document-extraction.md
|
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
|
2026-05-16 15:35:48 -04:00 |
|
error-handling-and-robustness.md
|
Add research: error handling, PDF/A guarantees, output schema, generator quirks
|
2026-05-16 16:07:13 -04:00 |
|
extraction-output-schema.md
|
docs(pdftract-645y): finalize extraction-output-schema.md v1.0 with all Phase 6.1 fields
|
2026-05-24 00:59:23 -04:00 |
|
extraction-pipeline-overview.md
|
Add three research documents: CJK encoding, pipeline synthesis, linearization
|
2026-05-16 15:26:36 -04:00 |
|
font-descriptor-and-metrics.md
|
Add research: xref parsing, object model, font descriptors, PDF/UA-2
|
2026-05-16 16:01:34 -04:00 |
|
font-subsetting-and-extraction.md
|
Add research: font subsetting, LaTeX patterns, redaction detection
|
2026-05-16 15:30:52 -04:00 |
|
form-fields-and-annotations.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
glyph-recognition-and-unicode-recovery.md
|
docs(pdftract-26r8): finalize glyph recognition research note v1.0
|
2026-05-24 02:10:06 -04:00 |
|
government-form-pdf-patterns.md
|
Add research: page labels, government forms, book publishing, filter decoding
|
2026-05-16 15:55:08 -04:00 |
|
graphics-state-tracking.md
|
Add three research documents on parser correctness fundamentals
|
2026-05-16 15:16:41 -04:00 |
|
historical-and-degraded-document-extraction.md
|
Add four research documents focused on readable text production
|
2026-05-16 15:13:10 -04:00 |
|
hyperlinks-and-named-destinations.md
|
Add research: article threads, resource dictionaries, catalog, hyperlinks
|
2026-05-16 16:04:00 -04:00 |
|
image-and-figure-extraction.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
image-compression-and-filter-decoding.md
|
Add research: page labels, government forms, book publishing, filter decoding
|
2026-05-16 15:55:08 -04:00 |
|
incremental-updates-and-versioning.md
|
Add research: portfolios, incremental updates, tagged PDF, JavaScript/forms
|
2026-05-16 15:45:59 -04:00 |
|
indic-script-extraction.md
|
Add research: Indic scripts, adversarial parser security
|
2026-05-16 16:18:03 -04:00 |
|
invisible-and-hidden-text.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
javascript-and-interactive-pdf-extraction.md
|
Add research: portfolios, incremental updates, tagged PDF, JavaScript/forms
|
2026-05-16 15:45:59 -04:00 |
|
language-detection-and-script-handling.md
|
Add six research documents covering output-side extraction topics
|
2026-05-16 14:56:25 -04:00 |
|
latex-and-scientific-pdf-patterns.md
|
Add research: font subsetting, LaTeX patterns, redaction detection
|
2026-05-16 15:30:52 -04:00 |
|
legal-and-financial-pdf-patterns.md
|
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
|
2026-05-16 15:35:48 -04:00 |
|
linearized-pdf-and-streaming.md
|
Add three research documents: CJK encoding, pipeline synthesis, linearization
|
2026-05-16 15:26:36 -04:00 |
|
malformed-pdf-repair-and-recovery.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
mathematical-expression-handling.md
|
Add six research documents covering output-side extraction topics
|
2026-05-16 14:56:25 -04:00 |
|
medical-and-scientific-pdf-patterns.md
|
Add research: color visibility, medical/scientific, multilingual, digital signatures
|
2026-05-16 15:41:43 -04:00 |
|
multilingual-document-extraction.md
|
Add research: color visibility, medical/scientific, multilingual, digital signatures
|
2026-05-16 15:41:43 -04:00 |
|
opentype-math-and-formula-extraction.md
|
Add research: Southeast Asian scripts, OpenType MATH formula extraction
|
2026-05-16 16:21:48 -04:00 |
|
optional-content-groups.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
page-geometry-and-document-structure.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
page-labels-and-outline-extraction.md
|
Add research: page labels, government forms, book publishing, filter decoding
|
2026-05-16 15:55:08 -04:00 |
|
parallel-extraction-architecture.md
|
Add parallel extraction research and comprehensive research index
|
2026-05-16 16:30:35 -04:00 |
|
pdf-encryption-and-security.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
pdf-fonts-and-encoding.md
|
Add research docs and SDK invocation notes
|
2026-05-16 14:33:34 -04:00 |
|
pdf-generator-quirks.md
|
Add research: error handling, PDF/A guarantees, output schema, generator quirks
|
2026-05-16 16:07:13 -04:00 |
|
pdf-object-model-and-data-types.md
|
Add research: xref parsing, object model, font descriptors, PDF/UA-2
|
2026-05-16 16:01:34 -04:00 |
|
pdf-portfolio-and-attachments.md
|
Add research: portfolios, incremental updates, tagged PDF, JavaScript/forms
|
2026-05-16 15:45:59 -04:00 |
|
pdf-specification.md
|
Add research docs and SDK invocation notes
|
2026-05-16 14:33:34 -04:00 |
|
pdfa-archival-extraction-guarantees.md
|
Add research: error handling, PDF/A guarantees, output schema, generator quirks
|
2026-05-16 16:07:13 -04:00 |
|
pdfa-compliance-and-extraction.md
|
Add three research documents on routing and text reconstruction
|
2026-05-16 15:22:08 -04:00 |
|
pdfua2-and-accessibility-standards.md
|
Add research: xref parsing, object model, font descriptors, PDF/UA-2
|
2026-05-16 16:01:34 -04:00 |
|
pdfvt-variable-transactional-printing.md
|
Add research: Ruby/furigana typography, PDF/VT variable printing
|
2026-05-16 16:24:21 -04:00 |
|
pdfx-prepress-extraction.md
|
Add research: color management, text metrics, PDF/X, content stream operators
|
2026-05-16 15:59:02 -04:00 |
|
performance-and-streaming-architecture.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
post-extraction-normalization.md
|
Add six research documents covering output-side extraction topics
|
2026-05-16 14:56:25 -04:00 |
|
post-ocr-text-correction.md
|
Add four research documents on text quality and document-type handling
|
2026-05-16 15:07:30 -04:00 |
|
presentation-and-spreadsheet-pdfs.md
|
Add four research documents on text quality and document-type handling
|
2026-05-16 15:07:30 -04:00 |
|
raster-ocr-pipeline.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
redaction-detection-and-recovery.md
|
Add research: font subsetting, LaTeX patterns, redaction detection
|
2026-05-16 15:30:52 -04:00 |
|
resource-dictionary-and-inheritance.md
|
Add research: article threads, resource dictionaries, catalog, hyperlinks
|
2026-05-16 16:04:00 -04:00 |
|
ruby-text-and-east-asian-typography.md
|
Add research: Ruby/furigana typography, PDF/VT variable printing
|
2026-05-16 16:24:21 -04:00 |
|
scanned-vs-vector-page-classification.md
|
Add three research documents on routing and text reconstruction
|
2026-05-16 15:22:08 -04:00 |
|
semantic-text-reconstruction.md
|
Add four research documents on text quality and document-type handling
|
2026-05-16 15:07:30 -04:00 |
|
shading-pattern-and-text-visibility.md
|
Add research: color visibility, medical/scientific, multilingual, digital signatures
|
2026-05-16 15:41:43 -04:00 |
|
southeast-asian-script-extraction.md
|
Add research: Southeast Asian scripts, OpenType MATH formula extraction
|
2026-05-16 16:21:48 -04:00 |
|
span-merging-and-text-run-assembly.md
|
Add research: span merging, Unicode normalization, implementation plan
|
2026-05-16 16:15:14 -04:00 |
|
stroke-and-outlined-text.md
|
Add research: rendering modes, legal/financial patterns, confidence scoring, engineering docs
|
2026-05-16 15:35:48 -04:00 |
|
table-structure-reconstruction.md
|
docs(pdftract-10cf): finalize table structure reconstruction research note v1.0
|
2026-05-24 09:58:03 -04:00 |
|
tagged-pdf-structure-and-reading-order.md
|
Add research docs and SDK invocation notes
|
2026-05-16 14:33:34 -04:00 |
|
text-positioning-and-font-metrics.md
|
Add research: color management, text metrics, PDF/X, content stream operators
|
2026-05-16 15:59:02 -04:00 |
|
text-readability-validation.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
type3-font-extraction.md
|
Add four research documents focused on readable text production
|
2026-05-16 15:13:10 -04:00 |
|
unicode-normalization-and-text-cleanup.md
|
Add research: span merging, Unicode normalization, implementation plan
|
2026-05-16 16:15:14 -04:00 |
|
watermark-and-background-separation.md
|
docs(pdftract-372e): finalize watermark and background separation research note v1.0
|
2026-05-24 10:33:37 -04:00 |
|
word-boundary-reconstruction.md
|
docs(pdftract-5vhp): bring word-boundary-reconstruction.md to v1.0 final-pass
|
2026-05-24 03:55:43 -04:00 |
|
xmp-and-document-metadata.md
|
Add 12 research documents covering full PDF extraction surface
|
2026-05-16 15:05:42 -04:00 |
|
xref-table-parsing-and-object-lookup.md
|
Add research: xref parsing, object model, font descriptors, PDF/UA-2
|
2026-05-16 16:01:34 -04:00 |