pdftract/templates/sdk-skeleton/python-subprocess
jedarden e3c7b2eec0 fix(pdftract-l993m): fix Tera template syntax in Methods templates
Fix incorrect Tera template syntax in per-language Methods templates:
- Change `elsif` to `elif` (correct Tera conditional syntax)
- Fix inline ternary-like syntax to use proper `{% if %}...{% else %}...{% endif %}`
- Fix truncated package name in Java template (codegen → codegen)

Affected templates:
- PHP: Methods.php.tera
- Python: methods.py.tera
- Ruby: methods.rb.tera
- Swift: Methods.swift.tera
- Java: Methods.java.tera

All 8 subprocess SDK templates now render correctly with the codegen
command. Verified via `pdftract sdk codegen --lang <lang> --out /tmp/sdk-<lang>`.

Co-Authored-By: Claude Code <noreply@anthropic.com>
Bead-Id: pdftract-l993m
2026-05-18 02:29:21 -04:00
..
pdftract_subprocess fix(pdftract-l993m): fix Tera template syntax in Methods templates 2026-05-18 02:29:21 -04:00
tests/codegen feat(pdftract-l993m): complete per-language Tera template scaffolding 2026-05-18 02:01:46 -04:00
GENERATED.tera feat(pdftract-l993m): complete per-language Tera template scaffolding 2026-05-18 02:01:46 -04:00
pyproject.toml.tera feat(pdftract-l993m): complete per-language Tera template scaffolding 2026-05-18 02:01:46 -04:00
README.md.tera feat(pdftract-l993m): complete per-language Tera template scaffolding 2026-05-18 02:01:46 -04:00

# pdftract-subprocess

Python SDK for pdftract - PDF extraction and conformance testing (subprocess fallback).

This package provides a subprocess-based fallback when the native PyO3 module fails to load. It is slower but provides full functionality.

## Installation

```bash
pip install pdftract-subprocess=={{ version }}
```

## Usage

### Basic extract

```python
from pdftract_subprocess import Client, PathSource

client = Client()
doc = client.extract(PathSource('document.pdf'))
print(f"Pages: {len(doc['pages'])}")
```

### Extract with OCR

```python
from pdftract_subprocess import Client, PathSource

client = Client()
doc = client.extract(PathSource('scanned.pdf'), options={
    'ocr_language': 'eng',
    'ocr_threshold': 0.7
})
```

### Search

```python
from pdftract_subprocess import Client, PathSource

client = Client()
for match in client.search(PathSource('document.pdf'), 'invoice'):
    print(f"Found on page {match['page']}: {match['text']}")
```

### Stream extraction

```python
from pdftract_subprocess import Client, PathSource

client = Client()
for page in client.extract_stream(PathSource('large.pdf')):
    print(f"Page {page['page']}: {len(page.get('blocks', []))} blocks")
```

## Binary version compatibility

This SDK requires pdftract {{ version }}. Download from:
https://github.com/jedarden/pdftract/releases/tag/v{{ version }}

## Troubleshooting

### Binary not found
Ensure `pdftract` is on your PATH. The SDK probes PATH for the executable.

### Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version.

### Network failure
For remote URLs, check your network connection and TLS certificate chain.