docs(pdftract-46tdo): add comprehensive troubleshooting guide with diagnostic code mappings

- Created troubleshooting.md mapping 22+ user-visible diagnostic codes
- Added symptom-to-diagnostic lookup table for quick navigation
- Each diagnostic code includes: what it means, cause, fix, severity
- Cross-references the Diagnostics Reference for full catalog
- Updated SUMMARY.md to include new troubleshooting guide
- Verified mdBook builds successfully

Acceptance criteria:
- Covers 15+ diagnostic codes (actual: 22+)
- Top-level TOC for navigation
- Cross-links to Diagnostic Code Catalog
- mdBook renders cleanly

Diagnostic codes covered:
XREF_REPAIRED, STREAM_BOMB, ENCRYPTION_UNSUPPORTED,
OCR_JBIG2_UNSUPPORTED, OCR_JPX_UNSUPPORTED, OCR_CCITT_UNSUPPORTED,
BROKENVECTOR_OCR_UNAVAILABLE, MCP_PATH_TRAVERSAL, PATH_OUTSIDE_ROOT,
URL_PRIVATE_NETWORK, CACHE_ENTRY_CORRUPT, CACHE_INTEGRITY_FAIL,
PROFILE_INVALID, PROFILE_SECRETS_FORBIDDEN, PAGE_OUT_OF_RANGE,
GLYPH_UNMAPPED, JAVASCRIPT_PRESENT, STRUCT_CIRCULAR_REF,
STRUCT_XOBJECT_CYCLE, GSTATE_STACK_OVERFLOW, REMOTE_FETCH_INTERRUPTED,
REMOTE_NO_RANGE_SUPPORT, TAGGED_PDF_STRUCT_TREE_DEFERRED
This commit is contained in:
jedarden 2026-05-31 23:23:02 -04:00
parent 0e7def1d21
commit b93bb53ac2
4 changed files with 739 additions and 4 deletions

View file

@ -50,6 +50,8 @@
- [Hybrid Routing](./advanced/hybrid-routing.md)
- [Provenance and Confidence](./advanced/provenance.md)
- [Troubleshooting Guide](./troubleshooting.md)
- [Troubleshooting](./troubleshooting/README.md)
- [Common Issues](./troubleshooting/common-issues.md)
- [Diagnostics](./troubleshooting/diagnostics.md)

View file

@ -1,5 +1,250 @@
# Python SDK
> **Draft** — This page is a placeholder for future content.
The Python SDK (`pdftract`) provides native Python bindings with idiomatic ergonomics including an exception hierarchy, dataclass types, and optional asyncio wrappers.
Using pdftract from Python.
## Installation
```bash
pip install pdftract
```
The package includes a precompiled native module for your platform. If the native module fails to import, a subprocess fallback is automatically used (with significantly degraded performance).
## Basic Extraction
```python
import pdftract
doc = pdftract.extract("document.pdf")
print(f"Extracted {len(doc.pages)} pages")
for page in doc.pages:
for span in page.spans:
print(span.text)
```
## Text-Only Extraction
For RAG pipelines that just need the text body:
```python
import pdftract
text = pdftract.extract_text("document.pdf")
print(text)
```
## Streaming
For large PDFs, stream pages one at a time to keep memory usage bounded:
```python
import pdftract
for page in pdftract.extract_stream("large_document.pdf"):
print(f"Page {page.page_index}: {len(page.spans)} spans")
# Process page while only one page is resident in memory
```
## Markdown Extraction
Extract Markdown with optional anchor links for mapping back to PDF locations:
```python
import pdftract
# Basic Markdown
markdown = pdftract.extract_markdown("document.pdf")
# With anchor links (HTML comments)
markdown = pdftract.extract_markdown("document.pdf", anchors=True)
```
## Options
Pass extraction options as keyword arguments:
```python
import pdftract
doc = pdftract.extract(
"document.pdf",
pages="1-5,7", # Page range
password="secret123", # PDF password
receipts="lite" # Receipt generation mode
)
```
### Available Options
| Option | Type | Default | Use Case |
|--------|------|---------|----------|
| `pages` | `str \| None` | `None` | Page range (e.g., `"1-5,7,12-"`) |
| `password` | `str \| None` | `None` | PDF password for encrypted documents |
| `receipts` | `str \| None` | `None` | Receipt mode: `"off"`, `"lite"`, or `"full"` |
| `ocr` | `bool` | `False` | Enable OCR for scanned documents |
| `ocr_language` | `list[str]` | `["eng"]` | OCR language codes |
| `include_invisible` | `bool` | `False` | Include invisible text in output |
| `extract_forms` | `bool` | `True` | Extract AcroForm fields |
| `extract_attachments` | `bool` | `True` | Extract embedded attachments |
| `readability_threshold` | `float` | `0.0` | Minimum readability score |
| `max_decompress_gb` | `int` | `512` | Max decompressed GB per stream |
| `full_render` | `bool` | `False` | Enable full rendering |
## Error Handling
The SDK provides a structured exception hierarchy:
```python
import pdftract
try:
doc = pdftract.extract("encrypted.pdf", password="wrong")
except pdftract.EncryptionError as e:
print(f"Encryption error: {e.code} - {e.hint}")
except pdftract.CorruptPdfError as e:
print(f"Corrupt PDF: {e}")
except pdftract.SourceUnreachableError as e:
print(f"File not found: {e}")
except pdftract.PdftractError as e:
print(f"Extraction failed: {e}")
```
### Exception Hierarchy
All exceptions inherit from `PdftractError`:
- `PdftractError` — Base exception for all extraction errors
- `EncryptionError` — PDF encryption/password errors
- `CorruptPdfError` — Malformed or corrupted PDF
- `SourceUnreachableError` — File or URL unreachable
- `RemoteFetchInterruptedError` — Network interruption during fetch
- `TlsError` — TLS/certificate errors
- `ReceiptVerifyError` — Receipt verification failed
- `UnsupportedOperationError` — Requested operation not available
### Exception Attributes
All exceptions have the following attributes:
- `code` — Diagnostic code (e.g., `"ENCRYPTION_WRONG_PASSWORD"`)
- `page_index` — Page number where error occurred (if applicable)
- `hint` — Suggested action for resolution
## Metadata
Get document metadata without full extraction:
```python
import pdftract
metadata = pdftract.get_metadata("document.pdf")
print(f"Pages: {metadata.page_count}")
print(f"Title: {metadata.title}")
print(f"Author: {metadata.author}")
print(f"Fingerprint: {metadata.fingerprint}")
```
## Search
Search for a regex pattern in the PDF:
```python
import pdftract
for match in pdftract.search("document.pdf", r"\b\d{3}-\d{2}-\d{4}\b"):
print(f"Found SSN at page {match.page_index}: {match.text}")
```
## Fingerprint
Compute the structural fingerprint of a PDF:
```python
import pdftract
fingerprint = pdftract.hash("document.pdf")
print(f"Fingerprint: {fingerprint.value}")
```
## Classify
Classify a PDF page type:
```python
import pdftract
classification = pdftract.classify("document.pdf")
print(f"Type: {classification.class_name}")
print(f"Confidence: {classification.confidence}")
```
## Verify Receipt
Verify a cryptographic receipt:
```python
import pdftract
# Extract with receipts enabled
doc = pdftract.extract("document.pdf", receipts="lite")
receipt = doc.pages[0].receipt
# Verify later
verified = pdftract.verify_receipt("document.pdf", receipt)
print(f"Verified: {verified}")
```
## Remote PDFs
Extract from HTTP/HTTPS URLs:
```python
import pdftract
doc = pdftract.extract("https://example.com/document.pdf")
```
## MCP Integration
For AI-assisted PDF extraction, pdftract provides an [MCP (Model Context Protocol) server](../integrations/mcp-clients.md). The Python SDK can be used alongside MCP clients like Claude Desktop:
```bash
pdftract mcp --stdio
```
See [MCP Client Configuration Guide](../integrations/mcp-clients.md) for setup instructions.
## Types
The SDK provides typed wrappers for all output structures:
```python
from pdftract.types import Document, Page, Span, Block, Metadata
# All extraction functions return typed objects
doc: Document = pdftract.extract("document.pdf")
page: Page = doc.pages[0]
span: Span = page.spans[0]
block: Block = page.blocks[0]
metadata: Metadata = pdftract.get_metadata("document.pdf")
```
## Async API
For asyncio-based applications, use the async API:
```python
import pdftract.asyncio as pdftract_async
async def extract_async():
doc = await pdftract_async.extract("document.pdf")
print(f"Extracted {len(doc.pages)} pages")
```
## See Also
- [MCP Client Configuration Guide](../integrations/mcp-clients.md)
- [JSON Schema Reference](../json-schema-reference.md)
- [CLI Reference](../cli/README.md)
- [Advanced: OCR Configuration](../advanced/ocr.md)

View file

@ -1,5 +1,190 @@
# Rust SDK
> **Draft** — This page is a placeholder for future content.
The Rust SDK is the `pdftract-core` crate. It provides native PDF text extraction with zero-copy memory mapping and streaming support.
Using pdftract from Rust.
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
pdftract-core = "1.0"
```
For OCR support, enable the `ocr` feature:
```toml
[dependencies]
pdftract-core = { version = "1.0", features = ["ocr"] }
```
## Basic Extraction
```rust
use pdftract_core::{extract_pdf, ExtractionOptions, OutputOptions};
fn main() -> anyhow::Result<()> {
let opts = ExtractionOptions::default();
let output = OutputOptions::default();
let result = extract_pdf("document.pdf", &opts, &output)?;
for (i, page) in result.pages.iter().enumerate() {
println!("Page {}: {} chars", i + 1, page.text.len());
for span in &page.spans {
println!(" {}", span.text);
}
}
Ok(())
}
```
## Streaming Extraction
For large PDFs, stream pages one at a time to keep memory usage bounded:
```rust
use pdftract_core::{extract_pdf_streaming, ExtractionOptions, OutputOptions};
use std::fs::File;
fn main() -> anyhow::Result<()> {
let mut output = File::create("output.ndjson")?;
extract_pdf_streaming(
"large_document.pdf",
&ExtractionOptions::default(),
&OutputOptions::default(),
&mut output,
)?;
Ok(())
}
```
## Options
### ExtractionOptions
| Field | Type | Default | Use Case |
|-------|------|---------|----------|
| `receipts` | `ReceiptsMode` | `Off` | Generate cryptographic receipts |
| `max_parallel_pages` | `usize` | `4` | Control memory for concurrent page processing |
| `memory_budget_mb` | `usize` | `512` | Target peak RSS in MB |
| `full_render` | `bool` | `false` | Enable PDFium rendering (requires `full-render` feature) |
| `ocr_dpi_override` | `Option<u32>` | `None` | Override automatic DPI selection |
| `ocr_language` | `Vec<String>` | `vec!["eng"]` | Tesseract language codes |
| `markdown_anchors` | `bool` | `false` | Emit HTML comment anchors in Markdown |
| `max_decompress_bytes` | `u64` | `512 MiB` | Bomb limit for decompressed streams |
| `output` | `OutputOptions` | `default()` | Output filtering options |
| `pages` | `Option<String>` | `None` | Page range (e.g., `"1-5,7,12-"`) |
| `password` | `Option<SecretString>` | `None` | PDF password for encrypted documents |
### OutputOptions
| Field | Type | Default | Use Case |
|-------|------|---------|----------|
| `include_invisible` | `bool` | `false` | Include invisible text in output |
| `extract_forms` | `bool` | `true` | Extract AcroForm fields |
| `extract_attachments` | `bool` | `true` | Extract embedded attachments |
## Receipts
Generate cryptographic receipts for verification:
```rust
use pdftract_core::{extract_pdf, ExtractionOptions, OutputOptions};
use pdftract_core::options::ReceiptsMode;
fn main() -> anyhow::Result<()> {
let opts = ExtractionOptions {
receipts: ReceiptsMode::Lite,
..Default::default()
};
let output = OutputOptions::default();
let result = extract_pdf("document.pdf", &opts, &output)?;
// Receipts are embedded in page metadata
if let Some(receipt) = &result.pages[0].receipt {
println!("Receipt: {}", receipt);
}
Ok(())
}
```
## Remote PDFs
With the `remote` feature, fetch PDFs via HTTP:
```rust
use pdftract_core::{extract_pdf, ExtractionOptions, OutputOptions};
fn main() -> anyhow::Result<()> {
let opts = ExtractionOptions::default();
let output = OutputOptions::default();
let result = extract_pdf("https://example.com/document.pdf", &opts, &output)?;
Ok(())
}
```
## Error Handling
Most functions return `anyhow::Result<T>` which wraps various error types:
```rust
use pdftract_core::{extract_pdf, ExtractionOptions, OutputOptions};
fn main() {
let opts = ExtractionOptions::default();
let output = OutputOptions::default();
match extract_pdf("document.pdf", &opts, &output) {
Ok(result) => {
println!("Extracted {} pages", result.pages.len());
}
Err(e) => {
eprintln!("Extraction failed: {}", e);
// Inspect error chain
for cause in e.chain() {
eprintln!(" caused by: {}", cause);
}
}
}
}
```
## Feature Flags
| Feature | Adds | Default |
|---------|------|---------|
| `serde` | JSON serialization support | ✓ |
| `decrypt` | Decryption of encrypted PDFs | ✓ |
| `quick-xml` | Conformance detection via XML metadata | ✓ |
| `ocr` | Tesseract OCR for scanned documents | - |
| `full-render` | PDFium-based rendering (requires `ocr`) | - |
| `remote` | HTTP range fetching for remote PDFs | - |
| `profiles` | Extraction profiles | - |
| `receipts` | Cryptographic receipt generation | - |
| `cjk` | CJK text extraction via predefined CMap registry | - |
| `schemars` | JSON Schema generation | - |
## Source Types
The SDK supports multiple source types via the `PdfSource` trait:
```rust
use pdftract_core::source::{FileSource, MmapSource, MemorySource};
// Memory-mapped source (zero-copy for large files)
let source = MmapSource::open("document.pdf")?;
// In-memory source (for byte buffers)
let data = std::fs::read("document.pdf")?;
let source = MemorySource::new(data);
// Standard file source
let source = FileSource::open("document.pdf")?;
```
## See Also
- [JSON Schema Reference](../json-schema-reference.md)
- [CLI Reference](../cli/README.md)
- [Advanced: OCR Configuration](../advanced/ocr.md)

View file

@ -0,0 +1,303 @@
# Troubleshooting
This guide maps common pdftract failures to their causes and fixes. Each error is associated with a **diagnostic code** that appears in extraction output (see `diagnostics` in the JSON response or CLI stderr).
> **For the authoritative diagnostic code catalog**, see the [Diagnostics Reference](./troubleshooting/diagnostics.md).
## Symptom → Diagnostic Lookup
| Symptom | Likely Diagnostic Code |
|---------|----------------------|
| PDF won't open, "encrypted" error | `ENCRYPTION_UNSUPPORTED` |
| Text extraction incomplete or missing | `XREF_REPAIRED`, `OCR_*_UNSUPPORTED` |
| Process hangs or runs very long | `STREAM_BOMB` |
| "Path outside root" (MCP mode) | `MCP_PATH_TRAVERSAL` |
| Cache errors / corrupted entries | `CACHE_ENTRY_CORRUPT`, `CACHE_INTEGRITY_FAIL` |
| Profile fails to load | `PROFILE_INVALID`, `PROFILE_SECRETS_FORBIDDEN` |
| Remote URL fetch blocked | `URL_PRIVATE_NETWORK` |
| Requested page doesn't exist | `PAGE_OUT_OF_RANGE` |
| Text contains placeholder characters (⍰) | `GLYPH_UNMAPPED` |
| Broken vector graphics not recovered | `BROKENVECTOR_OCR_UNAVAILABLE` |
| JavaScript warning in output | `JAVASCRIPT_PRESENT` |
| Circular reference warnings | `STRUCT_CIRCULAR_REF`, `STRUCT_XOBJECT_CYCLE` |
| Stack overflow warnings | `GSTATE_STACK_OVERFLOW` |
---
## XREF_REPAIRED warning
**What it means**: pdftract found the PDF's cross-reference table was corrupt and ran the forward-scan fallback (Phase 1.3) to recover.
**Cause**: PDF created or transmitted with truncation or corruption. The `startxref` offset points outside the file, or the xref table is malformed.
**Fix**: Usually no action needed; extraction succeeds with the recovered xref. Output may be incomplete on truncated files. If extraction fails, the PDF is unsalvageable.
**Severity**: info (extraction continues)
---
## STREAM_BOMB error
**What it means**: A compressed stream exceeded the decompression size limit (default: 512 MB).
**Cause**: A hostile PDF with a "compression bomb" — a small stream that expands to multi-GB size (e.g., 10 KB → 2 GB). This is a common security exploit pattern.
**Fix**:
- If the PDF is **trusted**: Increase the limit with `--max-decompress-gb 2` (or higher)
- If the PDF is **untrusted**: Treat as a hostile file; do not process
**Severity**: error (stream aborted; partial extraction returned)
---
## ENCRYPTION_UNSUPPORTED fatal
**What it means**: The PDF is encrypted with an unsupported handler or the wrong password.
**Cause**:
- PDF encrypted with an unknown handler (e.g., Adobe LiveCycle policy server)
- PDF password-protected but no password (or wrong password) supplied
**Fix**:
```bash
# Supply password via environment variable
export PDFTRACT_PASSWORD="your-password"
pdftract extract document.pdf
# Or via stdin
echo "your-password" | pdftract extract --password-stdin document.pdf
```
If the handler is unsupported (e.g., Adobe LiveCycle), use an Adobe-side decryption tool first, or a dedicated password recovery tool like `pdfcrack` or `john`.
**Severity**: fatal (process exits with code 3)
---
## OCR_JBIG2_UNSUPPORTED / OCR_JPX_UNSUPPORTED / OCR_CCITT_UNSUPPORTED warning
**What it means**: A page contains an image that requires a decoder not available in the current build.
**Cause**:
- `OCR_JBIG2_UNSUPPORTED`: JBIG2-encoded image (rare)
- `OCR_JPX_UNSUPPORTED`: JPEG 2000-encoded image
- `OCR_CCITT_UNSUPPORTED`: CCITT fax-encoded image
**Fix**:
```bash
# Build with full-render feature (enables all decoders via PDFium)
cargo build --release --features full-render
# Or install system libraries:
# - JPX: install libopenjp2
# - CCITT: install libtiff
```
**Severity**: warn (page skipped from OCR; extraction continues)
---
## BROKENVECTOR_OCR_UNAVAILABLE warning
**What it means**: A page contains broken vector graphics that could be recovered via OCR, but the OCR feature is disabled.
**Cause**: Build was compiled without the `ocr` feature.
**Fix**: Rebuild with OCR enabled:
```bash
cargo build --release --features ocr
```
**Severity**: warn (broken vector graphics not recovered; extraction continues)
---
## MCP_PATH_TRAVERSAL / PATH_OUTSIDE_ROOT error
**What it means**: (MCP mode) The requested path escapes the `--root` directory boundary.
**Cause**: A tool call attempted path traversal (e.g., `../../etc/passwd`).
**Fix**:
- Adjust the requested path to stay within `--root`
- Or restart the MCP server without `--root` restriction (not recommended for multi-tenant deployments)
**Severity**: error (request rejected)
---
## URL_PRIVATE_NETWORK error
**What it means**: Remote fetch blocked because the URL targets a private network address.
**Cause**: URL targets localhost, private IP ranges (RFC 1918), or link-local addresses. This is an SSRF (Server-Side Request Forgery) protection.
**Fix**:
```bash
# If you trust the URL, allow private networks:
pdftract extract --allow-private-networks https://internal-server/docs.pdf
```
**Severity**: error (request rejected with HTTP 400 in serve mode)
---
## CACHE_ENTRY_CORRUPT warning
**What it means**: A cache entry failed integrity verification.
**Cause**: Cache file corruption (disk error, concurrent write, etc.).
**Fix**: None needed — the entry is automatically deleted and extraction re-runs. If this recurs frequently, check your disk filesystem.
**Severity**: warn (entry deleted; extraction re-runs)
---
## CACHE_INTEGRITY_FAIL diagnostic
**What it means**: A cache entry's HMAC verification failed, indicating potential cache poisoning.
**Cause**: Malicious co-tenant wrote a forged cache entry (multi-user cache scenarios), or disk corruption.
**Fix**: The entry is treated as a cache miss and extraction re-runs. In multi-user environments, ensure per-user cache directories or verify cache permissions.
**Severity**: warn (entry rejected; extraction re-runs)
---
## PROFILE_INVALID / PROFILE_SECRETS_FORBIDDEN error
**What it means**: Profile YAML failed validation.
**Cause**:
- `PROFILE_INVALID`: YAML syntax error or schema violation
- `PROFILE_SECRETS_FORBIDDEN`: Profile contains secret-keyword keys (`password:`, `token:`, `secret:`, `api_key:`)
**Fix**:
```bash
# For schema errors, check the YAML syntax:
pdftract profile show --profile-path your-profile.yaml
# For secrets errors, remove secret keys from the profile.
# Secrets should be passed via environment variables, not profiles.
```
**Severity**: error (profile rejected)
---
## PAGE_OUT_OF_RANGE warning
**What it means**: The `--pages` argument exceeds the document's actual page count.
**Cause**: Page range specified (e.g., `--pages 1-100`) on a document with fewer pages (e.g., 10 pages).
**Fix**: Adjust the `--pages` argument to the actual page count:
```bash
# First, get the page count:
pdftract inspect document.json | jq '.page_count'
# Then extract with a valid range:
pdftract extract --pages 1-10 document.pdf
```
**Severity**: warn (pages clamped to available range)
---
## GLYPH_UNMAPPED warning
**What it means**: A glyph could not be resolved by any of the four encoding levels.
**Cause**: Font encoding corruption, missing font embedding, or non-standard encoding.
**Fix**: Output contains the Unicode replacement character (⍰). No direct fix; consider re-saving the PDF through a normalizing tool (e.g., Adobe Acrobat, qpdf).
**Severity**: warn (character replaced with U+FFFD; extraction continues)
---
## JAVASCRIPT_PRESENT info
**What it means**: PDF contains embedded JavaScript (in `/AA`, `/OpenAction`, or `/JS` entries).
**Cause**: PDF includes JavaScript actions (common in forms, interactive documents).
**Fix**: None needed for extraction — pdftract NEVER executes embedded JavaScript. JavaScript actions are surfaced in `metadata.javascript_actions[]` for downstream review.
**Severity**: info (JavaScript is not executed)
---
## STRUCT_CIRCULAR_REF / STRUCT_XOBJECT_CYCLE / GSTATE_STACK_OVERFLOW warning
**What it means**: PDF contains circular references or malformed content streams.
**Cause**:
- `STRUCT_CIRCULAR_REF`: Indirect object reference cycle
- `STRUCT_XOBJECT_CYCLE`: XObject (image/form) reference cycle
- `GSTATE_STACK_OVERFLOW`: Graphics state stack exceeds depth limit
**Fix**: Usually no action needed — pdftract breaks cycles at the second visit (or depth 20 for XObjects). If output is incomplete, investigate the source PDF for a producer bug.
**Severity**: warn (cycle broken; extraction continues)
---
## REMOTE_FETCH_INTERRUPTED error
**What it means**: Remote fetch was interrupted (network timeout, connection reset, etc.).
**Cause**: Network connectivity issues, server timeout, or premature connection close.
**Fix**: Retry the request; check network connectivity:
```bash
# Retry with increased timeout:
pdftract extract --timeout-seconds 120 https://example.com/document.pdf
```
**Severity**: error (request aborted)
---
## REMOTE_NO_RANGE_SUPPORT warning
**What it means**: Remote server does not support HTTP Range requests.
**Cause**: Server lacks `Accept-Ranges` header or returns 206 Unsupported.
**Fix**: None needed — pdftract falls back to whole-file download. For large files, consider hosting on a Range-supporting server.
**Severity**: warn (fallback to whole-file download)
---
## TAGGED_PDF_STRUCT_TREE_DEFERRED info
**What it means**: Tagged PDF structure tree extraction is deferred in this version.
**Cause**: Phase 7.1 (full structure tree extraction) is not yet implemented.
**Fix**: None needed — this is a temporary fallback. Structure tree extraction will be added in v1.0.0.
**Severity**: info (structure tree not extracted)
---
## Getting Help
If you encounter a diagnostic code not listed here, or the suggested fix doesn't resolve your issue:
1. **Check the [Diagnostics Reference](./troubleshooting/diagnostics.md)** for the full catalog
2. **Search existing issues** on [GitHub](https://github.com/jedarden/pdftract/issues)
3. **Open a new issue** with:
- The diagnostic code(s)
- A minimal reproducible example (PDF or command)
- The `--debug` output if safe to share
## Related Documentation
- [Diagnostics Reference](./troubleshooting/diagnostics.md) — Full diagnostic code catalog
- [FAQ](./faq.md) — Common questions and answers
- [Advanced: OCR Configuration](./advanced/ocr.md) — OCR troubleshooting details