- Created docs/operations/manual-platform-smoke.md with comprehensive smoke test runbook for KU-12 quarterly manual platform testing - Added troubleshooting table covering all 14 doctor checks - Cross-referenced runbook from installation.md and quickstart.md - Added CI gate test (doctor_runbook_coverage.rs) to verify troubleshooting table completeness Acceptance criteria: ✓ Step 1: pdftract doctor as first section in runbook ✓ Troubleshooting table covers all FAIL-capable checks ✓ installation.md mentions pdftract doctor with runbook link ✓ quickstart.md uses pdftract doctor as first example command ✓ CI gate parses runbook and asserts all checks are present ✓ mdBook build succeeds ✓ No broken internal links Closes: pdftract-653ah
4.3 KiB
Installation
pdftract is distributed as a native binary, a Python package, and a Docker image. Choose the installation method that matches your workflow.
Install via Cargo
cargo install pdftract
This installs the pdftract binary in ~/.cargo/bin/. Make sure ~/.cargo/bin is in your PATH.
Pre-built Binaries
Pre-built binaries are available from GitHub Releases. Download the archive for your platform, extract, and place the binary in your PATH.
Cargo Binstall
For faster installation without compiling from source:
cargo binstall pdftract
This downloads a pre-built binary from the GitHub Release instead of compiling locally.
Install via pip
pdftract is distributed on PyPI as a native Python extension with PyO3 bindings.
pip install pdftract
The Python package includes the same extraction engine as the CLI, accessible via a Python API. See Python SDK for usage.
Platform Wheels
Wheels are available for:
- Linux
x86_64(manylinux2014, musllinux) - macOS
x86_64andarm64 - Windows
x86_64
If no wheel is available for your platform, pip will fall back to building from source (requires Rust toolchain).
Install via Homebrew
Note: Homebrew formula is deferred to v1.1+. In the meantime, use cargo install pdftract or the Docker image.
See the Non-Goals section in the project plan for the rationale.
Install via Docker
Docker images are available on GitHub Container Registry:
docker pull ghcr.io/jedarden/pdftract:latest
docker run --rm -v $(pwd):/work ghcr.io/jedarden/pdftract:latest extract /work/document.pdf
Image Variants
| Tag | Description |
|---|---|
latest |
Default features (vector extraction, basic OCR) |
ocr |
Includes Tesseract for full OCR support |
full |
All features including PDFium for rasterization |
Multi-arch manifests support amd64 and arm64 platforms.
Platform Support
Supported Platforms
| Platform | CI Status | Notes |
|---|---|---|
Linux x86_64 (glibc) |
Fully CI-tested | Primary development platform |
Linux x86_64 (musl) |
Fully CI-tested | Alpine-compatible |
Linux arm64 (glibc) |
Fully CI-tested | ARM64 servers (e.g., Graviton) |
Linux arm64 (musl) |
Fully CI-tested | Alpine ARM64 |
macOS x86_64 |
Build-tested | See caveat below |
macOS arm64 |
Build-tested | See caveat below |
Windows x86_64 |
Build-tested | See caveat below |
Cross-Platform Test Limitation (KU-12)
Linux is fully CI-tested; macOS and Windows are build-tested and manually smoke-tested per release.
Per project architecture decision ADR-009, the CI pipeline runs on Linux-only infrastructure (iad-ci). macOS and Windows binaries are built via cross-compilation but are never executed in automated CI. This is acknowledged as Known Unknown KU-12 with the following mitigation:
- A manual smoke-test runbook is executed by the release lead before each milestone against at least one physical macOS machine and one Windows VM
- User bug reports for platform-specific issues are acknowledged within 48 hours and addressed in the next patch release
- No claim of "tested on macOS/Windows" appears in CI status badges
If you encounter a platform-specific issue on macOS or Windows, please file a bug report. The project is committed to fixing platform bugs promptly.
Minimum Rust Version
If building from source, pdftract requires Rust 1.78 or later. The MSRV is pinned in Cargo.toml and tested on every PR.
Verifying Installation
Run the following command to verify your installation:
pdftract --version
You should see output like:
pdftract 0.1.0
For the Python package:
python -c "import pdftract; print(pdftract.__version__)"
Environment Health Check
After installation, verify your environment is properly configured for pdftract:
pdftract doctor
This validates that all OS-level dependencies (Tesseract, leptonica, libtiff, etc.) are installed and correctly configured. See the Operations Runbook for detailed troubleshooting of each check.
Next Steps
Once installed, proceed to the Quickstart for a five-minute walkthrough of pdftract's core features.