pdftract

History

jedarden 11257e7706 feat(pdftract-l993m): complete per-language Tera template scaffolding Complete the Tera template scaffolding for all 8 subprocess-based SDKs under templates/sdk-skeleton/<lang>/: node, go, java, dotnet, ruby, php, swift, python-subprocess. Each template directory contains: - Package metadata template (package.json, go.mod, pom.xml, etc.) - Method stubs template (methods.ts, client.go, Methods.java, etc.) - Error stubs template (errors.ts, errors.go, Errors.java, etc.) - Conformance runner template (conformance.test.ts, etc.) - README template with {{ version }} variable substitution - GENERATED.tera marker file New files for python-subprocess: - pdftract_subprocess/codegen/errors.py.tera - tests/codegen/conformance_test.py.tera - README.md.tera - GENERATED.tera All 8 language template directories are now complete and ready for consumption by the `pdftract sdk codegen` subcommand. Co-Authored-By: Claude Code <noreply@anthropic.com>		2026-05-18 02:01:46 -04:00
..
lib	feat(pdftract-l993m): complete per-language Tera template scaffolding	2026-05-18 02:01:46 -04:00
test/codegen	feat(pdftract-l993m): complete per-language Tera template scaffolding	2026-05-18 02:01:46 -04:00
GENERATED.tera	feat(pdftract-l993m): complete per-language Tera template scaffolding	2026-05-18 02:01:46 -04:00
pdftract.gemspec.tera	feat(pdftract-l993m): complete per-language Tera template scaffolding	2026-05-18 02:01:46 -04:00
README.md.tera	feat(pdftract-l993m): complete per-language Tera template scaffolding	2026-05-18 02:01:46 -04:00

README.md.tera

# pdftract-ruby

Ruby SDK for pdftract - PDF extraction and conformance testing.

## Installation

```bash
gem install pdftract -v {{ version }}
```

Or in your Gemfile:

```ruby
gem 'pdftract', '~> {{ version }}'
```

## Usage

### Basic extract

```ruby
require 'pdftract'

client = Pdftract.client
doc = client.extract(Pdftract::PathSource.new('document.pdf'))
puts "Pages: #{doc.pages.length}"
```

### Extract with OCR

```ruby
options = OpenStruct.new(
  ocr_language: 'eng',
  ocr_threshold: 0.7
)

doc = client.extract(Pdftract::PathSource.new('scanned.pdf'), options)
```

### Search

```ruby
client.search(Pdftract::PathSource.new('document.pdf'), 'invoice').each do |match|
  puts "Found on page #{match.page}: #{match.text}"
end
```

### Stream extraction

```ruby
client.extract_stream(Pdftract::PathSource.new('large.pdf')).each do |page|
  puts "Page #{page.page}: #{page.blocks&.length || 0} blocks"
end
```

## Binary version compatibility

This SDK requires pdftract {{ version }}. Download from:
https://github.com/jedarden/pdftract/releases/tag/v{{ version }}

## Troubleshooting

### Binary not found
Ensure `pdftract` is on your PATH. The SDK probes PATH for the executable.

### Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version.

### Network failure
For remote URLs, check your network connection and TLS certificate chain.