# pdftract-subprocess
Python SDK for pdftract - PDF extraction and conformance testing (subprocess fallback).
This package provides a subprocess-based fallback when the native PyO3 module fails to load. It is slower but provides full functionality.
## Installation
```bash
pip install pdftract-subprocess=={{ version }}
```
## Usage
### Basic extract
```python
from pdftract_subprocess import Client, PathSource
client = Client()
doc = client.extract(PathSource('document.pdf'))
print(f"Pages: {len(doc['pages'])}")
```
### Extract with OCR
```python
from pdftract_subprocess import Client, PathSource
client = Client()
doc = client.extract(PathSource('scanned.pdf'), options={
'ocr_language': 'eng',
'ocr_threshold': 0.7
})
```
### Search
```python
from pdftract_subprocess import Client, PathSource
client = Client()
for match in client.search(PathSource('document.pdf'), 'invoice'):
print(f"Found on page {match['page']}: {match['text']}")
```
### Stream extraction
```python
from pdftract_subprocess import Client, PathSource
client = Client()
for page in client.extract_stream(PathSource('large.pdf')):
print(f"Page {page['page']}: {len(page.get('blocks', []))} blocks")
```
## Binary version compatibility
This SDK requires pdftract {{ version }}. Download from:
https://github.com/jedarden/pdftract/releases/tag/v{{ version }}
## Troubleshooting
### Binary not found
Ensure `pdftract` is on your PATH. The SDK probes PATH for the executable.
### Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version.
### Network failure
For remote URLs, check your network connection and TLS certificate chain.