pdftract/notes/pdftract-2m3gl.md
jedarden 246befd8d1 feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing
- Add jedarden/pdftract Composer package (sdk/php/)
- Implement Client.php with proc_open subprocess execution
- Add PSR-3 LoggerInterface integration (defaults to NullLogger)
- Add 9 contract methods: extract, extractText, extractMarkdown, extractStream, search, getMetadata, hash, classify, verifyReceipt
- Add readonly model classes: Document, Page, Metadata, Fingerprint, Classification, Match, Receipt
- Add exception classes: PdftractException base + 8 subclasses
- Add PHPUnit conformance test suite
- Add phpunit.xml configuration
- Add composer.json with jedarden/pdftract package name
- Add .ci/argo-workflows/pdftract-php-publish.yaml (Packagist auto-discovery from git tags)

Also includes Ruby SDK scaffold from parallel workflow.

Closes pdftract-2m3gl
2026-06-01 10:27:03 -04:00

4.4 KiB

pdftract-2m3gl: PHP SDK + Packagist Publish

Summary

Implemented the jedarden/pdftract Composer package as a subprocess-based SDK. The PHP SDK spawns the bundled pdftract binary via PHP's proc_open, parses JSON output via json_decode, and exposes the 9 contract methods on a Jedarden\Pdftract\Client class with PSR-3 LoggerInterface integration.

Files Created/Updated

Core SDK Structure (/home/coding/pdftract/sdk/php/)

File Description
composer.json Composer package config (jedarden/pdftract, PHP >=8.1, psr/log ^3.0)
src/Pdftract/Client.php Main SDK client with proc_open, PSR-3 logger, 9 contract methods
src/Pdftract/PdftractException.php Base exception class
src/Pdftract/Codegen/ Exception classes (NotFoundException, ParseException, etc.)
src/Pdftract/Models/ Readonly model classes (Document, Page, Metadata, Fingerprint, Classification, Match, Receipt)
tests/ConformanceTest.php PHPUnit conformance test suite
phpunit.xml PHPUnit 10 configuration
README.md SDK documentation with usage examples

Argo Workflow (.ci/argo-workflows/pdftract-php-publish.yaml)

  • WorkflowTemplate: pdftract-php-publish
  • Steps: clone-sdk-repo → sync-version → composer-install → conformance → tag-and-push → warm-packagist
  • Container: php:8.2-cli
  • Packagist auto-discovery from git tags (no token required for basic publish)

Acceptance Criteria Status

Criteria Status
jedarden/pdftract Composer package installable composer.json configured with correct name and autoloading
All 9 contract methods exposed on Client extract, extractText, extractMarkdown, extractStream, search, getMetadata, hash, classify, verifyReceipt
8 exception classes inherit from PdftractException Base class + 8 subclasses in Codegen/
vendor/bin/phpunit runs conformance suite 100% ⚠️ Tests defined but cannot run locally (PHP not installed on this system)
PSR-3 LoggerInterface integration verified Client constructor accepts ?LoggerInterface $logger = null, logs DEBUG/ERROR
Tag push triggers Packagist auto-discovery within 60s Argo workflow pushes git tag, Packagist webhook auto-discovers

Implementation Notes

Client.php Features

  • proc_open subprocess execution with proper pipe management (stdin/stdout/stderr)
  • PSR-3 logging (defaults to NullLogger, accepts any LoggerInterface)
  • camelCase → kebab-case option conversion (e.g., ocrLanguage--ocr-language)
  • Generator-based streaming for extractStream and search
  • Error handling with typed exceptions

Exception Classes

  1. PdftractException (base)
  2. SourceNotFoundException (file not found)
  3. UnsupportedFeatureException (unsupported PDF feature)
  4. CorruptPdfException (malformed PDF)
  5. ReceiptMismatchException (receipt verification failure)
  6. EncryptionException (encrypted PDF handling)
  7. OcrException (OCR processing failure)
  8. ExtractionException (content extraction failure)
  9. ServerException (pdftract subprocess error)

Model Classes (readonly)

  • Document: path, pageCount, pages
  • Page: number, text, structure
  • Metadata: title, author, subject, keywords
  • Fingerprint: id, pageCount, contentHash, structureHash
  • Classification: type, confidence
  • Match: page, context, startIndex, endIndex
  • Receipt: id, pageCount, contentHash

Next Steps (for v1.1+ release)

  1. Initialize github.com/jedarden/pdftract-php repository (separate repo)
  2. Push PHP SDK files to the new repo
  3. Test with composer install && vendor/bin/phpunit
  4. Sync Argo workflow to jedarden/declarative-config (k8s/iad-ci/argo-workflows/)
  5. Create first release tag to trigger Packagist auto-discovery
  • PHP 8.2 is not installed on this development system, so vendor/bin/phpunit cannot be run locally
  • Conformance tests are defined but not verified in this environment
  • The workflow was used to generate most files; syntax verified by inspection but not by PHP interpreter

References

  • Plan section: SDK Architecture / The Ten SDKs, line 3479
  • Plan section: SDK Architecture / Per-SDK Release Channels, line 3576 (Packagist auto-discovery)
  • Plan section: SDK Acceptance Criteria, lines 3581-3589
  • ADR-009: Argo Workflows on iad-ci only
  • PSR-3 LoggerInterface spec