|
…
|
||
|---|---|---|
| .. | ||
| Sources | ||
| Tests/PdftractTests | ||
| .codegen-version | ||
| GENERATED | ||
| Package.swift | ||
| README.md | ||
pdftract-swift
Swift SDK for pdftract - PDF extraction and analysis for server-side Swift.
Platform Support
Supported: macOS 13+, Linux (server-side use only) Unsupported: iOS (Apple does not allow spawning subprocesses in App Store apps)
Note for iOS users: Use
pdftract serveover HTTP from your iOS client. Run the server with the Swift SDK on a macOS/Linux backend and make HTTP requests from your iOS app.
Installation
Add to your Package.swift:
dependencies: [
.package(url: "https://github.com/jedarden/pdftract-swift", from: "1.0.0")
]
Usage
Basic extract
import Pdftract
let client = Pdftract()
let doc = try await client.extract(.path("document.pdf"))
print("Pages: \(doc.pages.count)")
print("Title: \(doc.metadata.title ?? "Untitled")")
Extract from URL
let doc = try await client.extract(.url(URL(string: "https://example.com/doc.pdf")!))
Extract with OCR
let options = ExtractOptions(
ocrLanguage: "eng",
ocrThreshold: 0.7
)
let doc = try await client.extract(.path("scanned.pdf"), options: options)
Extract text
let text = try await client.extractText(.path("document.pdf"))
print(text)
Extract Markdown
let md = try await client.extractMarkdown(.path("document.pdf"))
Stream extraction (for large PDFs)
for await page in client.extractStream(.path("large.pdf")) {
print("Page \(page.pageIndex + 1): \(page.blocks.count) blocks")
}
Search
for await match in client.search(.path("document.pdf"), "invoice") {
print("Found on page \(match.page): \(match.text)")
print(" Context: ...\(match.context.before)[\(match.text)]\(match.context.after)...")
}
Get metadata
let metadata = try await client.getMetadata(.path("document.pdf"))
print("Pages: \(metadata.pageCount)")
print("Author: \(metadata.author ?? "Unknown")")
Hash fingerprint
let fingerprint = try await client.hash(.path("document.pdf"))
print("SHA-256: \(fingerprint.hash)")
print("BLAKE3: \(fingerprint.fastHash)")
Classify document
let classification = try await client.classify(.path("document.pdf"))
print("Category: \(classification.category)")
print("Confidence: \(classification.confidence)")
Verify receipt
let receipt = Receipt(data: "...")
let valid = try await client.verifyReceipt("/path/to/receipt.pdf", receipt: receipt)
print("Valid: \(valid)")
Binary version compatibility
This SDK requires pdftract 1.0.0. Download from: https://github.com/jedarden/pdftract/releases/tag/v1.0.0
The SDK will search for pdftract on your PATH. To specify a custom binary path:
let client = Pdftract(binaryPath: "/custom/path/to/pdftract")
Error handling
All methods are async throws and can throw the following errors:
| Error | Exit Code | Description |
|---|---|---|
CorruptPdfError |
2 | The PDF file is corrupt or invalid |
EncryptionError |
3 | The PDF is encrypted and password is missing/wrong |
SourceUnreachableError |
4 | The source (file or URL) is unreadable |
RemoteFetchInterruptedError |
5 | Network interrupted during remote fetch |
TlsError |
6 | TLS certificate validation failed |
ReceiptVerifyError |
10 | Receipt verification failed |
PdftractError |
other | Internal error |
Example:
do {
let doc = try await client.extract(.path("document.pdf"))
} catch let error as PdftractError {
print("Error (code \(error.exitCode)): \(error.localizedDescription)")
}
Options
ExtractOptions
let options = ExtractOptions(
ocrLanguage: "eng", // ISO 639-3 language code
ocrThreshold: 0.7, // OCR confidence threshold (0-1)
preserveLayout: false, // Preserve original reading order
extractImages: false, // Extract embedded images
imageFormat: "png", // Format for images: png, jpg, webp
minImageSize: 64 // Minimum image dimension
)
SearchOptions
let options = SearchOptions(
caseInsensitive: true, // Ignore case
regex: false, // Treat pattern as regex
wholeWord: false, // Match whole words only
maxResults: 100 // Maximum matches
)
BaseOptions / HashOptions
let options = BaseOptions(
timeout: 60 // Maximum seconds
)
Troubleshooting
Binary not found
Ensure pdftract is on your PATH. The SDK searches PATH for the executable.
# Verify pdftract is available
pdftract --version
Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version from the releases page.
Network failure
For remote URLs, check your network connection and TLS certificate chain.
Conformance
This SDK passes 100% of the pdftract conformance suite. The conformance report for this release is linked in the GitHub Release.
License
MIT License - see LICENSE file for details.