- Add jedarden/pdftract Composer package (sdk/php/) - Implement Client.php with proc_open subprocess execution - Add PSR-3 LoggerInterface integration (defaults to NullLogger) - Add 9 contract methods: extract, extractText, extractMarkdown, extractStream, search, getMetadata, hash, classify, verifyReceipt - Add readonly model classes: Document, Page, Metadata, Fingerprint, Classification, Match, Receipt - Add exception classes: PdftractException base + 8 subclasses - Add PHPUnit conformance test suite - Add phpunit.xml configuration - Add composer.json with jedarden/pdftract package name - Add .ci/argo-workflows/pdftract-php-publish.yaml (Packagist auto-discovery from git tags) Also includes Ruby SDK scaffold from parallel workflow. Closes pdftract-2m3gl |
||
|---|---|---|
| .. | ||
| lib | ||
| test | ||
| .gitignore | ||
| GENERATED | ||
| LICENSE | ||
| pdftract.gemspec | ||
| Rakefile | ||
| README.md | ||
pdftract-ruby
Ruby SDK for pdftract - PDF extraction and conformance testing.
Installation
gem install pdftract
Or in your Gemfile:
gem 'pdftract', '~> 1.0.0'
Usage
Basic extract
require 'pdftract'
client = Pdftract.client
doc = client.extract('document.pdf')
puts "Pages: #{doc.pages.length}"
Extract with OCR
doc = client.extract('scanned.pdf', { ocr_language: 'eng', ocr_threshold: 0.7 })
Extract text
text = client.extract_text('document.pdf')
puts text
Extract Markdown
markdown = client.extract_markdown('document.pdf')
puts markdown
Stream extraction
client.extract_stream('large.pdf').each do |page|
puts "Page #{page.page}: #{page.blocks&.length || 0} blocks"
end
Search
client.search('document.pdf', 'invoice').each do |match|
puts "Found on page #{match.page}: #{match.text}"
end
Get metadata
metadata = client.get_metadata('document.pdf')
puts "Title: #{metadata.title}"
puts "Pages: #{metadata.page_count}"
Hash
fingerprint = client.hash('document.pdf')
puts "SHA-256: #{fingerprint.hash}"
puts "Fast hash: #{fingerprint.fast_hash}"
Classify
classification = client.classify('document.pdf')
puts "Category: #{classification.category}"
puts "Confidence: #{classification.confidence}"
Verify receipt
valid = client.verify_receipt('document.pdf', 'receipt-data')
puts "Valid: #{valid}"
Binary version compatibility
This SDK requires pdftract 1.0.0 or later. Download from: https://github.com/jedarden/pdftract/releases
Troubleshooting
Binary not found
Ensure pdftract is on your PATH. The SDK probes PATH for the executable.
Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version.
Network failure
For remote URLs, check your network connection and TLS certificate chain.