Verifies that the per-page Resource dictionary inheritance implementation is complete and correct. All acceptance criteria are met: - 3-level resource inheritance test passes - Per-key override test passes - /Resources missing on page inherits parent's - Arc<ResourceDict> sharing verified with Arc::ptr_eq - ColorSpace inline-array test passes - Empty root /Resources propagates correctly - INV-8 maintained (all fuzz tests pass) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.1 KiB
5.1 KiB
pdftract-32qkr: Java/Kotlin SDK Implementation
Summary
Implemented the com.jedarden:pdftract Maven artifact as a subprocess-based SDK. The SDK spawns the bundled pdftract binary via ProcessBuilder, parses JSON output via Jackson, and exposes all 9 contract methods on an AutoCloseable Pdftract client. Kotlin extension functions are bundled in the same artifact for idiomatic Kotlin syntax.
What Was Done
1. Project Structure Created
- Location:
github.com/jedarden/pdftract-java(separate repo) - Maven coordinates:
com.jedarden:pdftract:0.1.0 - Java version: 17 (minimum required)
- Build system: Maven with mixed Java/Kotlin compilation
2. Main Client Class (Pdftract.java)
- Implements
AutoCloseablefor try-with-resources pattern - 9 contract methods implemented:
extract(Source, ExtractOptions) -> DocumentextractText(Source, ExtractOptions) -> StringextractMarkdown(Source, ExtractOptions) -> StringextractStream(Source, ExtractOptions) -> Stream<Page>search(Source, String, SearchOptions) -> Stream<Match>getMetadata(Source, BaseOptions) -> Metadatahash(Source, BaseOptions) -> Fingerprintclassify(Source) -> ClassificationverifyReceipt(Path, Receipt) -> boolean
3. Data Types (Java Records)
All types are implemented as Java records with null-safe constructors:
Document,Page,Block,Line,SpanDocumentMetadata,Metadata,FingerprintMatch,Classification,ProcessingError,Receipt
4. Source Types (Sealed Interface)
Source- sealed interface with factory methodsPathSource- local file pathsUrlSource- remote URLsBytesSource- raw bytes (writes to temp file)
5. Exception Hierarchy (7 classes)
All inherit from PdftractException:
PdftractException(base, exit code -1)CorruptPdfException(exit code 2)EncryptionException(exit code 3)SourceUnreachableException(exit code 4)RemoteFetchInterruptedException(exit code 5)TlsException(exit code 6)ReceiptVerifyException(exit code 10)
6. Options Classes
BaseOptions- password, timeout (with covariant return types)ExtractOptions- OCR settings, layout, image extractionSearchOptions- max results, whole word matching
7. Kotlin Extensions (PdftractExt.kt)
- Lambda-based options syntax:
extract(path) { ocrLanguage = "eng" } - Invoke operator:
pdftract { ... } - Path/URL/bytes overloads for convenience
- Stream to Sequence conversion
8. JSON Configuration
Json.mapper()configured with:FAIL_ON_UNKNOWN_PROPERTIES(catch schema changes early)NON_NULLserialization inclusion
9. Tests
PdftractTest.java- 17 unit tests (structure verification)AutoCloseableTest.java- 9 tests (cleanup behavior)ConformanceTest.java- SDK conformance runner
Acceptance Criteria Status
| Criterion | Status | Notes |
|---|---|---|
mvn package builds |
✅ PASS | JAR built successfully |
| 9 contract methods | ✅ PASS | All implemented with correct signatures |
| 8 exception classes | ⚠️ WARN | 7 classes (matches contract - only 7 exit codes specified) |
| Document/Page as records | ✅ PASS | All types are Java records |
| Kotlin extensions | ✅ PASS | Idiomatic syntax in same jar |
mvn test 100% pass |
⚠️ WARN | Conformance tests blocked by incomplete CLI |
| AutoCloseable cleanup | ✅ PASS | Tests pass, subprocess cleanup verified |
Known Limitations
-
CLI Implementation: The pdftract CLI is not fully implemented yet:
- OCR options (
--ocr-language,--ocr-threshold) not available - Commands
grep,hash,classify,verify-receiptnot implemented - Conformance tests will pass once CLI is complete
- OCR options (
-
Future Optimizations: The current implementation spawns a subprocess per call. The design supports future optimization via
pdftract serveover Unix socket.
Files Modified/Created
Created (33 source files):
src/main/java/com/jedarden/pdftract/- 22 Java filessrc/main/java/com/jedarden/pdftract/codegen/- 7 Java filessrc/main/kotlin/com/jedarden/pdftract/- 1 Kotlin filesrc/test/java/com/jedarden/pdftract/- 3 test filespom.xml- Maven build configurationREADME.md- Comprehensive documentationLICENSE- MIT license
Build Verification
# Compile
nix-shell -p maven --run "mvn compile"
# Result: BUILD SUCCESS
# Package
nix-shell -p maven --run "mvn package -DskipTests"
# Result: BUILD SUCCESS, JAR created at target/pdftract-0.1.0.jar
# Unit tests
nix-shell -p maven --run "mvn test -Dtest=PdftractTest,AutoCloseableTest"
# Result: 26 tests passed, 0 failed
Next Steps
- Complete CLI implementation for full conformance test coverage
- Set up OSSRH account and GPG key for Maven Central publishing
- Create
pdftract-java-publishArgo workflow template - Add integration tests once CLI is fully implemented
References
- Plan: SDK Architecture / The Ten SDKs, line 3475
- Plan: SDK Architecture / Per-SDK Release Channels, line 3572
- Plan: SDK Acceptance Criteria, lines 3581-3589