History

…
..
Sources
Tests/PdftractTests
.codegen-version
GENERATED
Package.swift
README.md

README.md

pdftract-swift

Swift SDK for pdftract - PDF extraction and analysis for server-side Swift.

Platform Support

Supported: macOS 13+, Linux (server-side use only) Unsupported: iOS (Apple does not allow spawning subprocesses in App Store apps)

Note for iOS users: Use pdftract serve over HTTP from your iOS client. Run the server with the Swift SDK on a macOS/Linux backend and make HTTP requests from your iOS app.

Installation

Add to your Package.swift:

dependencies: [
    .package(url: "https://github.com/jedarden/pdftract-swift", from: "1.0.0")
]

Usage

Basic extract

import Pdftract

let client = Pdftract()
let doc = try await client.extract(.path("document.pdf"))
print("Pages: \(doc.pages.count)")
print("Title: \(doc.metadata.title ?? "Untitled")")

Extract from URL

let doc = try await client.extract(.url(URL(string: "https://example.com/doc.pdf")!))

Extract with OCR

let options = ExtractOptions(
    ocrLanguage: "eng",
    ocrThreshold: 0.7
)
let doc = try await client.extract(.path("scanned.pdf"), options: options)

Extract text

let text = try await client.extractText(.path("document.pdf"))
print(text)

Extract Markdown

let md = try await client.extractMarkdown(.path("document.pdf"))

Stream extraction (for large PDFs)

for await page in client.extractStream(.path("large.pdf")) {
    print("Page \(page.pageIndex + 1): \(page.blocks.count) blocks")
}

Search

for await match in client.search(.path("document.pdf"), "invoice") {
    print("Found on page \(match.page): \(match.text)")
    print("  Context: ...\(match.context.before)[\(match.text)]\(match.context.after)...")
}

Get metadata

let metadata = try await client.getMetadata(.path("document.pdf"))
print("Pages: \(metadata.pageCount)")
print("Author: \(metadata.author ?? "Unknown")")

Hash fingerprint

let fingerprint = try await client.hash(.path("document.pdf"))
print("SHA-256: \(fingerprint.hash)")
print("BLAKE3: \(fingerprint.fastHash)")

Classify document

let classification = try await client.classify(.path("document.pdf"))
print("Category: \(classification.category)")
print("Confidence: \(classification.confidence)")

Verify receipt

let receipt = Receipt(data: "...")
let valid = try await client.verifyReceipt("/path/to/receipt.pdf", receipt: receipt)
print("Valid: \(valid)")

Binary version compatibility

This SDK requires pdftract 1.0.0. Download from: https://github.com/jedarden/pdftract/releases/tag/v1.0.0

The SDK will search for pdftract on your PATH. To specify a custom binary path:

let client = Pdftract(binaryPath: "/custom/path/to/pdftract")

Error handling

All methods are async throws and can throw the following errors:

Error	Exit Code	Description
`CorruptPdfError`	2	The PDF file is corrupt or invalid
`EncryptionError`	3	The PDF is encrypted and password is missing/wrong
`SourceUnreachableError`	4	The source (file or URL) is unreadable
`RemoteFetchInterruptedError`	5	Network interrupted during remote fetch
`TlsError`	6	TLS certificate validation failed
`ReceiptVerifyError`	10	Receipt verification failed
`PdftractError`	other	Internal error

Example:

do {
    let doc = try await client.extract(.path("document.pdf"))
} catch let error as PdftractError {
    print("Error (code \(error.exitCode)): \(error.localizedDescription)")
}

Options

ExtractOptions

let options = ExtractOptions(
    ocrLanguage: "eng",           // ISO 639-3 language code
    ocrThreshold: 0.7,            // OCR confidence threshold (0-1)
    preserveLayout: false,        // Preserve original reading order
    extractImages: false,         // Extract embedded images
    imageFormat: "png",           // Format for images: png, jpg, webp
    minImageSize: 64              // Minimum image dimension
)

SearchOptions

let options = SearchOptions(
    caseInsensitive: true,        // Ignore case
    regex: false,                 // Treat pattern as regex
    wholeWord: false,             // Match whole words only
    maxResults: 100              // Maximum matches
)

BaseOptions / HashOptions

let options = BaseOptions(
    timeout: 60                   // Maximum seconds
)

Troubleshooting

Binary not found

Ensure pdftract is on your PATH. The SDK searches PATH for the executable.

# Verify pdftract is available
pdftract --version

Version mismatch

The SDK will refuse to invoke mismatched binary versions. Install the correct version from the releases page.

Network failure

For remote URLs, check your network connection and TLS certificate chain.

Conformance

This SDK passes 100% of the pdftract conformance suite. The conformance report for this release is linked in the GitHub Release.

License

MIT License - see LICENSE file for details.