pdftract/notes/pdftract-5gld.md
2026-06-08 20:00:57 -04:00

2.9 KiB

pdftract-5gld: README + rustdoc verification

Summary

Task completed: README.md already had KU-12 platform caveat prominently displayed. Fixed MSRV from 1.81.0 to 1.78 to match workspace Cargo.toml. Enhanced documentation section with descriptive link text.

Changes Made

README.md

  • MSRV correction: Changed from 1.81.0 to 1.78 to match workspace rust-version = "1.78" in Cargo.toml
  • Documentation section enhancement: Added descriptive text to each link:
    • "Comprehensive user guide at pdftract.com"
    • "Rust API documentation"
    • Additional descriptions for each link

Verification Results

README Sections (PASS)

  • Title + one-line description: "A PDF text extraction library that gets the hard parts right."
  • Status badges: crates.io, docs.rs, CI Status (Argo Workflows), License
  • Platform support table with KU-12 caveat (verbatim): "Linux is fully CI-tested; macOS and Windows are build-tested and manually smoke-tested per release"
  • Installation: cargo, pip, Docker, Homebrew
  • MSRV: 1.78 (corrected)
  • Quickstart: Rust, Python, CLI examples
  • Documentation links: user-docs, extraction-output-schema.md, sdk-architecture.md, manual-platform-smoke.md, Releases, crates.io
  • License: MIT OR Apache-2.0

Cargo Documentation (PASS)

  • cargo doc --lib --no-deps -p pdftract-core: Builds successfully
  • cargo test --doc -p pdftract-core: 135 passed, 0 failed, 69 ignored
  • #![deny(missing_docs)] enforced in lib.rs: No warnings

rustdoc Coverage (PASS)

The crate-level lib.rs has comprehensive documentation with 4 complete worked examples:

  1. Basic Text Extraction (extract_pdf)
  2. JSON Output with Schema (extract_pdf_ndjson)
  3. Streaming Extraction for Large Files (extract_pdf_streaming)
  4. With OCR for Scanned PDFs (feature-gated example)

Key public API items with examples:

  • extract_pdf, extract_pdf_ndjson, extract_pdf_streaming (lib.rs)
  • ExtractionOptions, OutputOptions, ReceiptsMode (options.rs)
  • SpanJson, BlockJson, CellJson, TableJson (schema/mod.rs)
  • Anchor, parse_anchors (markdown.rs)
  • CssHexColor, Span (span/mod.rs)
  • MarkdownOptions, page_to_markdown, span_to_markdown (markdown.rs)

Acceptance Criteria Status

  • README.md exists at repo root with all required sections
  • KU-12 caveat appears verbatim in README near the top (line 20)
  • cargo doc --no-deps builds successfully for pdftract-core
  • cargo test --doc green: all rustdoc examples compile and pass (135 passed, 0 failed)
  • README links to manual-platform-smoke.md, sdk-architecture.md, extraction-output-schema.md, pdftract.com

Files Modified

  • README.md: MSRV correction + enhanced documentation links

Commit

  • Commit: 56d7c1b3 - docs(pdftract-5gld): update README with MSRV and enhanced documentation links
  • Pushed to: forgejo main