pdftract/docs/user-docs/src/installation.md
jedarden a34f9c18d0 docs(pdftract-1g87): create mdBook scaffolding for user documentation
- book.toml with title, authors, build directory, edit-url-template
- src/SUMMARY.md with complete TOC for all planned sections
- src/introduction.md: what pdftract does and doesn't do (Non-Goals)
- src/installation.md: cargo, pip, Homebrew, Docker; KU-12 caveat verbatim
- src/quickstart.md: five-minute walkthrough with executable commands
- 39 draft placeholder files for CLI reference, schema, profiles, SDKs, advanced topics, troubleshooting, FAQ

mdbook build completes cleanly with zero warnings (linkcheck optional).

See notes/pdftract-1g87.md for verification details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:38:51 -04:00

3.9 KiB

Installation

pdftract is distributed as a native binary, a Python package, and a Docker image. Choose the installation method that matches your workflow.

Install via Cargo

cargo install pdftract

This installs the pdftract binary in ~/.cargo/bin/. Make sure ~/.cargo/bin is in your PATH.

Pre-built Binaries

Pre-built binaries are available from GitHub Releases. Download the archive for your platform, extract, and place the binary in your PATH.

Cargo Binstall

For faster installation without compiling from source:

cargo binstall pdftract

This downloads a pre-built binary from the GitHub Release instead of compiling locally.

Install via pip

pdftract is distributed on PyPI as a native Python extension with PyO3 bindings.

pip install pdftract

The Python package includes the same extraction engine as the CLI, accessible via a Python API. See Python SDK for usage.

Platform Wheels

Wheels are available for:

  • Linux x86_64 (manylinux2014, musllinux)
  • macOS x86_64 and arm64
  • Windows x86_64

If no wheel is available for your platform, pip will fall back to building from source (requires Rust toolchain).

Install via Homebrew

Note: Homebrew formula is deferred to v1.1+. In the meantime, use cargo install pdftract or the Docker image.

See the Non-Goals section in the project plan for the rationale.

Install via Docker

Docker images are available on GitHub Container Registry:

docker pull ghcr.io/jedarden/pdftract:latest
docker run --rm -v $(pwd):/work ghcr.io/jedarden/pdftract:latest extract /work/document.pdf

Image Variants

Tag Description
latest Default features (vector extraction, basic OCR)
ocr Includes Tesseract for full OCR support
full All features including PDFium for rasterization

Multi-arch manifests support amd64 and arm64 platforms.

Platform Support

Supported Platforms

Platform CI Status Notes
Linux x86_64 (glibc) Fully CI-tested Primary development platform
Linux x86_64 (musl) Fully CI-tested Alpine-compatible
Linux arm64 (glibc) Fully CI-tested ARM64 servers (e.g., Graviton)
Linux arm64 (musl) Fully CI-tested Alpine ARM64
macOS x86_64 Build-tested See caveat below
macOS arm64 Build-tested See caveat below
Windows x86_64 Build-tested See caveat below

Cross-Platform Test Limitation (KU-12)

Linux is fully CI-tested; macOS and Windows are build-tested and manually smoke-tested per release.

Per project architecture decision ADR-009, the CI pipeline runs on Linux-only infrastructure (iad-ci). macOS and Windows binaries are built via cross-compilation but are never executed in automated CI. This is acknowledged as Known Unknown KU-12 with the following mitigation:

  • A manual smoke-test runbook is executed by the release lead before each milestone against at least one physical macOS machine and one Windows VM
  • User bug reports for platform-specific issues are acknowledged within 48 hours and addressed in the next patch release
  • No claim of "tested on macOS/Windows" appears in CI status badges

If you encounter a platform-specific issue on macOS or Windows, please file a bug report. The project is committed to fixing platform bugs promptly.

Minimum Rust Version

If building from source, pdftract requires Rust 1.78 or later. The MSRV is pinned in Cargo.toml and tested on every PR.

Verifying Installation

Run the following command to verify your installation:

pdftract --version

You should see output like:

pdftract 0.1.0

For the Python package:

python -c "import pdftract; print(pdftract.__version__)"

Next Steps

Once installed, proceed to the Quickstart for a five-minute walkthrough of pdftract's core features.