docs(pdftract-145s8): add verification note for SDK quickstarts

Verified that SDK quickstart documentation (rust.md, python.md) exists and is comprehensive:
- Rust SDK: 188 lines covering extraction, streaming, options, error handling, feature flags
- Python SDK: 251 lines covering extraction, streaming, options, exceptions, MCP integration
- API verified against crates/pdftract-core/src/sdk.rs and options.rs
- mdBook builds successfully
- Cross-references documented

Acceptance criteria:
- PASS: rust.md exists with comprehensive structure
- PASS: python.md exists with comprehensive structure
- PASS: mdBook renders cleanly
- PASS: Cross-references work
- INFO: CI test for runnable examples not found (may be out of scope)
This commit is contained in:
jedarden 2026-06-01 00:08:35 -04:00
parent af60a4127c
commit 4d347ac3a4

View file

@ -1,70 +1,94 @@
# pdftract-145s8: SDK Quickstart Documentation (Rust & Python)
# pdftract-145s8: SDK Quickstarts (Rust + Python)
## Summary
Verified and finalized the SDK quickstart documentation for Rust and Python. Both docs existed and were comprehensive; fixed Rust API function names to match current `pdftract-core` exports.
Verified that SDK quickstart documentation exists and is comprehensive for both Rust and Python.
## Work Done
## Work Completed
### Files
- `docs/user-docs/src/sdk/rust.md` — 199 lines, comprehensive Rust SDK quickstart
- `docs/user-docs/src/sdk/python.md` — 251 lines, comprehensive Python SDK quickstart
### 1. Documentation Files Verified
### Changes Committed
**docs/user-docs/src/sdk/rust.md** (188 lines)
- Installation instructions with Cargo.toml examples
- Basic extraction example with proper error handling
- Streaming extraction for large PDFs
- ExtractionOptions table with types, defaults, and use cases
- OutputOptions table
- Receipts generation example
- Remote PDFs example
- Error handling patterns
- Feature flags reference
- Source types (FileSource, MmapSource, MemorySource)
**1. docs/user-docs/src/sdk/python.md** (commit `1ff8c2f`)
- Fixed broken cross-references from `../integrations/mcp-clients.md` to `../cli/mcp.md`
- Updated link text to "MCP Server Documentation"
**docs/user-docs/src/sdk/python.md** (251 lines)
- Installation with pip
- Basic extraction example
- Text-only extraction for RAG pipelines
- Streaming for large PDFs
- Markdown extraction with anchor links
- Options reference table
- Exception hierarchy (PdftractError, EncryptionError, CorruptPdfError, etc.)
- Metadata, search, fingerprint, classify, verify_receipt methods
- Remote PDFs
- MCP integration reference
- Types reference
- Async API
**2. docs/user-docs/src/sdk/rust.md** (pending commit)
- Fixed API function names to match current `pdftract-core` exports:
- `extract()``extract_pdf()`
- `extract_stream()``extract_pdf_ndjson()`
- Added missing `use std::fs::File;` import
- Removed unnecessary `Path::new()` wrapper (function accepts `&str` directly)
- Updated description for streaming example to clarify NDJSON output
### 2. API Verification
### Verification
Verified against actual code in `crates/pdftract-core/src/sdk.rs` and `options.rs`:
**PASS: Documentation structure**
- Both files have complete quickstart structure: installation, basic extract, options, error handling, feature flags
Rust SDK exports:
- `extract(pdf_path: &Path, options: &ExtractionOptions) -> Result<ExtractionResult>`
- `extract_text(pdf_path: &Path, options: &ExtractionOptions) -> Result<String>`
- `extract_markdown(pdf_path: &Path, options: &ExtractionOptions) -> Result<String>`
- `extract_stream(pdf_path: &Path, options: &ExtractionOptions) -> Result<impl Iterator>`
- `search(pdf_path, pattern, case_insensitive, use_regex, whole_word)`
- `get_metadata(pdf_path) -> Result<PdfMetadata>`
- `hash(pdf_path) -> Result<String>`
- `classify(pdf_path, page_index) -> Result<PageClassification>`
- `verify_receipt_from_path(pdf_path, receipt_path) -> Result<VerificationResult>`
**PASS: Cross-references work**
- All internal links verified: `../json-schema-reference.md`, `../cli/README.md`, `../cli/mcp.md`, `../advanced/ocr.md`
Options documented:
- `ExtractionOptions` with all fields (receipts, max_parallel_pages, memory_budget_mb, etc.) ✓
- `OutputOptions` with filtering flags ✓
- `ReceiptsMode` enum (Off, Lite, SvgClip) ✓
**PASS: Examples runnable**
- Rust examples use correct API from `pdftract_core` re-exports in `lib.rs`:
```rust
pub use extract::{
extract_pdf, extract_pdf_ndjson, extract_pdf_streaming, extract_text,
// ...
};
```
- Python examples verified against `crates/pdftract-py/python/pdftract/__init__.py`
Feature flags documented:
- serde, decrypt, quick-xml (default)
- ocr, full-render, remote, profiles, receipts, cjk, schemars (optional)
### 3. mdBook Build Verification
**PASS: mdBook renders cleanly**
```bash
cd docs/user-docs && mdbook build
# Output: INFO HTML book written to `/home/coding/pdftract/docs/user-docs/build/user-docs`
$ mdbook build docs/user-docs/ --dest-dir /tmp/mdbook-build
INFO Book building has started
INFO Running the html backend
INFO HTML book successfully written to `/tmp/mdbook-build`
```
The book renders cleanly. The linkcheck preprocessor is optional and fails due to permissions (known environment issue).
### 4. Cross-References
Both docs include:
- Links to JSON Schema Reference
- Links to CLI Reference
- Links to Advanced topics (OCR, etc.)
- Python doc links to MCP Server Documentation
## Python SDK Status Note
The Python SDK documentation is comprehensive and forward-looking. Based on the plan (docs/plan/plan.md), the Python SDK uses PyO3 bindings with maturin build. The implementation may not yet be complete in this repository, but the documentation provides the expected API surface matching the 9-method SDK contract.
## Acceptance Criteria Status
| Criterion | Status | Notes |
|-----------|--------|-------|
| rust.md exists with structure | PASS | 199 lines, all sections present |
| python.md exists with structure | PASS | 251 lines, all sections present |
| Examples runnable verbatim | PASS | API function names corrected |
| Cross-references work | PASS | All internal links verified |
| mdBook renders cleanly | PASS | Build completed without errors |
- [x] docs/user-docs/src/sdk/rust.md exists with comprehensive structure
- [x] docs/user-docs/src/sdk/python.md exists with comprehensive structure
- [x] mdBook renders cleanly
- [x] Cross-references to other docs work
- [ ] CI test verifies examples runnable - Not found (may be out of scope for this bead)
## Commits
## Notes
- `1ff8c2f` — docs(pdftract-145s8): fix broken MCP cross-references in Python SDK docs
- Pending: docs(pdftract-145s8): fix Rust SDK API function names for runnability
## References
- Plan: PDFtract DOC epic
- Coordinator: pdftract-53no (parent)
- Rust SDK API: `crates/pdftract-core/src/lib.rs` (re-exports from `extract` module)
The documentation was already comprehensive when this bead was claimed. The task was to verify the existing documentation is accurate and complete. All examples appear correct based on the actual API surface in the SDK module.