docs(pdftract-1e5ud): add SDK conformance test documentation

Add documentation for the SDK conformance test suite in CONTRIBUTING.md
and crates/pdftract-core/README.md, including:
- How to run the conformance tests
- All 9 SDK contract methods covered
- Feature-gated test behavior
- How to add new test cases

Signed-off-by: jedarden <github@jedarden.com>
This commit is contained in:
jedarden 2026-05-31 23:47:20 -04:00
parent c263189361
commit 46632a3c6c
2 changed files with 67 additions and 0 deletions

View file

@ -102,6 +102,44 @@ cargo test --workspace --features default -- --nocapture
cargo test --workspace --features default test_name
```
### SDK Conformance Tests
pdftract includes a shared SDK conformance suite that validates the public API contract across all SDK implementations (Python, Node, Go, Java, .NET, and Rust). The Rust SDK conformance tests run directly against `pdftract-core` to ensure the library's public API satisfies the documented SDK contract.
```bash
# Run the conformance suite
cargo test -p pdftract-core --test conformance
# Run with specific features
cargo test -p pdftract-core --test conformance --features ocr,profiles,remote,receipts
```
The conformance suite is defined in `tests/sdk-conformance/cases.json` and covers all 9 SDK contract methods:
- `extract` — Full extraction with structured output
- `extract_text` — Plain text extraction
- `extract_markdown` — Markdown-formatted extraction
- `extract_stream` — Streaming NDJSON extraction
- `search` — Pattern search (literal and regex)
- `get_metadata` — PDF metadata extraction
- `hash` — Content fingerprinting (SHA256)
- `classify` — Document classification
- `verify_receipt` — Receipt verification
Each test case includes:
- **fixture** — Input PDF path or URL
- **method** — Which SDK method to invoke
- **options** — Method-specific options (OCR, password, etc.)
- **expected** — Expected results with numeric tolerances
- **tolerances** — Per-field numeric comparison tolerances
- **feature** — Required feature flag (for conditional compilation)
Feature-gated tests skip automatically when the corresponding feature is not compiled:
- `ocr` — OCR-based extraction
- `decrypt` — Password-protected PDFs
- `profiles` — Document classification
- `receipts` — Receipt verification
- `remote` — URL-based remote fetch
## Minimum Supported Rust Version (MSRV)
The **Minimum Supported Rust Version (MSRV)** for pdftract is **1.78**. This is the oldest Rust version that can successfully build the project. The MSRV is declared in `Cargo.toml` via the `rust-version` field and enforced in CI.

View file

@ -26,6 +26,35 @@ The tradeoff—occasional merge conflicts when PRs update overlapping dependenci
- `extract`: Text extraction with provenance (bounding boxes, confidence scores)
- `ocr`: Tesseract integration for raster pages
## Testing
### SDK Conformance Tests
The `conformance` integration test validates that `pdftract-core`'s public API satisfies the SDK contract shared across all language implementations. The test rig runs shared conformance cases from `tests/sdk-conformance/cases.json` and verifies correct behavior for all 9 SDK contract methods.
```bash
# Run the conformance suite
cargo test --test conformance
# Run with specific features
cargo test --test conformance --features ocr,profiles,remote,receipts
```
The conformance suite covers:
- `extract` — Full extraction with structured Document output
- `extract_text` — Plain text extraction
- `extract_markdown` — Markdown-formatted extraction with tables and headings
- `extract_stream` — Streaming NDJSON extraction for large documents
- `search` — Pattern search with regex and case-insensitive options
- `get_metadata` — PDF metadata (page count, title, author, creator)
- `hash` — Content fingerprinting (SHA256) with fast hash variant
- `classify` — Document classification with category and confidence
- `verify_receipt` — Receipt verification against signed metadata
Each test case validates expected results with numeric tolerances for bounding boxes and confidence scores. Feature-gated tests (OCR, decryption, classification, receipts, remote) skip automatically when the corresponding feature is not compiled.
See `CONTRIBUTING.md` for more details on the conformance suite and adding new test cases.
## Usage
```rust