History

jedarden 39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator Add image_coverage_fraction signal evaluator that computes the union image coverage fraction from individual image XObject areas. - Computes total image coverage as sum of image_xobject_areas - Divides by page area (width * height) to get coverage fraction - Clamps to [0.0, 1.0] to handle overlapping images (defensive) - Returns Some(Vote::scanned(0.85)) if fraction > 0.85 Implementation uses sum for simplicity (overestimates coverage when images overlap), which is acceptable for the 0.85 threshold as it's a conservative signal. Can be revisited with Klee's algorithm for greater accuracy if needed. Acceptance criteria PASS: ✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned }) ✓ Page with multiple small images totaling 50% → None (below threshold) ✓ Page with no images → None ✓ Coverage clamped to 1.0 on overlapping images Also includes pre-existing infrastructure: - tr3_op_count field in PageContext - image_xobject_areas field in PageContext - all_tr3_with_full_page_image function - CharDensityRatioSignal evaluator These were necessary dependencies for the new evaluator to function. Refs: Plan section Phase 5.1.2, coordinator pdftract-22p		2026-05-31 23:42:38 -04:00
..
go.md	docs(pdftract-1g87): create mdBook scaffolding for user documentation	2026-05-18 00:38:51 -04:00
javascript.md	docs(pdftract-1g87): create mdBook scaffolding for user documentation	2026-05-18 00:38:51 -04:00
python.md	docs(pdftract-145s8): fix broken MCP cross-references in Python SDK docs	2026-05-31 23:34:41 -04:00
README.md	feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator	2026-05-31 23:42:38 -04:00
rust.md	feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator	2026-05-31 23:42:38 -04:00

README.md

SDK Quickstarts

Getting started guides for using pdftract from various programming languages. Each SDK implements the same 9-method contract: extract, extract_text, extract_markdown, extract_stream, search, get_metadata, hash, classify, and verify_receipt.

Available SDKs

Rust — The pdftract-core crate with native zero-copy PDF processing
Python — Native Python bindings with PyO3, plus subprocess fallback
JavaScript/TypeScript — npm package with Node.js and browser support
Go — Go module with native bindings

Choosing an SDK

Rust — Best for performance-critical applications and CLI tools
Python — Best for data science, ML pipelines, and scripting
JavaScript — Best for web applications and serverless functions
Go — Best for microservices and cloud-native applications

All SDKs support:

Remote PDFs via HTTP/HTTPS URLs
Encrypted PDFs with password
OCR for scanned documents (with feature flag)
Streaming extraction for large documents
Cryptographic receipt verification

README.md

SDK Quickstarts

Available SDKs

Choosing an SDK

See Also