pdftract/docs/user-docs/src/sdk
jedarden 39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.

- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85

Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.

Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images

Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator

These were necessary dependencies for the new evaluator to function.

Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
2026-05-31 23:42:38 -04:00
..
go.md docs(pdftract-1g87): create mdBook scaffolding for user documentation 2026-05-18 00:38:51 -04:00
javascript.md docs(pdftract-1g87): create mdBook scaffolding for user documentation 2026-05-18 00:38:51 -04:00
python.md docs(pdftract-145s8): fix broken MCP cross-references in Python SDK docs 2026-05-31 23:34:41 -04:00
README.md feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator 2026-05-31 23:42:38 -04:00
rust.md feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator 2026-05-31 23:42:38 -04:00

SDK Quickstarts

Getting started guides for using pdftract from various programming languages. Each SDK implements the same 9-method contract: extract, extract_text, extract_markdown, extract_stream, search, get_metadata, hash, classify, and verify_receipt.

Available SDKs

  • Rust — The pdftract-core crate with native zero-copy PDF processing
  • Python — Native Python bindings with PyO3, plus subprocess fallback
  • JavaScript/TypeScript — npm package with Node.js and browser support
  • Go — Go module with native bindings

Choosing an SDK

  • Rust — Best for performance-critical applications and CLI tools
  • Python — Best for data science, ML pipelines, and scripting
  • JavaScript — Best for web applications and serverless functions
  • Go — Best for microservices and cloud-native applications

All SDKs support:

  • Remote PDFs via HTTP/HTTPS URLs
  • Encrypted PDFs with password
  • OCR for scanned documents (with feature flag)
  • Streaming extraction for large documents
  • Cryptographic receipt verification

See Also