Consolidate the .NET, Java, and Node SDKs into root-level pdftract-<lang>/ directories (matching the already-tracked pdftract-go/), per the decision to make the generated SDKs first-class monorepo members rather than separate repos. Content imported from the standalone ~/pdftract-<lang> repos (build artifacts excluded). Removes the broken empty-git nested clones that were polluting the working tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.3 KiB
Verification Note: pdftract-32qkr — Java/Kotlin SDK Implementation
Summary
Implemented the com.jedarden:pdftract Maven artifact as a subprocess-based SDK with full Java and Kotlin support. The SDK spawns the bundled pdftract binary via ProcessBuilder, parses JSON output via Jackson, and exposes all 9 contract methods on an AutoCloseable Pdftract client.
Acceptance Criteria Status
PASS Items
-
✅ Maven artifact builds with
mvn packagecom.jedarden:pdftract:0.1.0builds successfully- All Java and Kotlin sources compile without errors
- Output:
target/pdftract-0.1.0.jar
-
✅ All 9 contract methods exposed with documented signatures
Document extract(Source source, ExtractOptions options)String extractText(Source source, ExtractOptions options)String extractMarkdown(Source source, ExtractOptions options)Stream<Page> extractStream(Source source, ExtractOptions options)Stream<Match> search(Source source, String pattern, SearchOptions options)Metadata getMetadata(Source source, BaseOptions options)Fingerprint hash(Source source, BaseOptions options)Classification classify(Source source)boolean verifyReceipt(Path path, Receipt receipt)
-
✅ All 8 exception classes inherit from PdftractException
PdftractException(base class)CorruptPdfException(exit code 2)EncryptionException(exit code 3)SourceUnreachableException(exit code 4)RemoteFetchInterruptedException(exit code 5)TlsException(exit code 6)ReceiptVerifyException(exit code 10)- All properly extend
PdftractExceptionwith exit code tracking
-
✅ Document, Page, etc. exposed as Java records
Document,Page,Span,Block,LineMatch,Fingerprint,ClassificationMetadata,DocumentMetadataSource(sealed interface withPathSource,UrlSource,BytesSource)
-
✅ Kotlin extensions in the same jar
src/main/kotlin/com/jedarden/pdftract/PdftractExt.kt- Lambda syntax support:
pdftract.extract(path) { ocrLanguage = "eng" } - Invoke operator for use-with-resources pattern
- Java Stream to Kotlin Sequence conversion
-
✅
mvn testruns the conformance runner- 27 tests pass (17 unit tests + 9 AutoCloseable tests + 1 conformance runner)
- Conformance runner implemented in
ConformanceTest.java - Test fixtures referenced from
tests/sdk-conformance/cases.json
-
✅ AutoCloseable cleanup verified
AutoCloseableTestpasses all 9 tests- Child processes tracked and destroyed on close
- Try-with-resources pattern works correctly
Implementation Details
File Structure
pdftract-java/
├── pom.xml # Maven build config (Java 17, Jackson 2.17.0)
├── src/
│ ├── main/java/com/jedarden/pdftract/
│ │ ├── Pdftract.java # Main client (AutoCloseable)
│ │ ├── Source.java # Sealed interface for sources
│ │ ├── PathSource.java # File path source
│ │ ├── UrlSource.java # URL source
│ │ ├── BytesSource.java # Byte array source
│ │ ├── PdftractException.java # Base exception
│ │ ├── CorruptPdfException.java # Exit code 2
│ │ ├── EncryptionException.java # Exit code 3
│ │ ├── SourceUnreachableException.java # Exit code 4
│ │ ├── RemoteFetchInterruptedException.java # Exit code 5
│ │ ├── TlsException.java # Exit code 6
│ │ ├── ReceiptVerifyException.java # Exit code 10
│ │ ├── Document.java # Record type
│ │ ├── Page.java # Record type
│ │ ├── Span.java # Record type
│ │ ├── Block.java # Record type
│ │ ├── Line.java # Record type
│ │ ├── Match.java # Record type
│ │ ├── Fingerprint.java # Record type
│ │ ├── Classification.java # Record type
│ │ ├── Metadata.java # Record type
│ │ ├── DocumentMetadata.java # Record type
│ │ └── codegen/
│ │ ├── BaseOptions.java # Base options with timeout, password
│ │ ├── ExtractOptions.java # Extract-specific options
│ │ ├── SearchOptions.java # Search-specific options
│ │ ├── Receipt.java # Receipt type
│ │ ├── ProcessingError.java # Error type
│ │ └── Json.java # Jackson ObjectMapper config
│ └── main/kotlin/com/jedarden/pdftract/
│ └── PdftractExt.kt # Kotlin extension functions
└── src/test/java/com/jedarden/pdftract/
├── PdftractTest.java # Unit tests
├── AutoCloseableTest.java # Cleanup verification
├── ConformanceTest.java # Conformance runner
└── IntegrationTest.java # Integration tests
Key Design Decisions
- Sealed interface for Source: Allows type-safe source handling with compile-time exhaustiveness
- Java records: Immutable data carriers with built-in equals/hashCode/toString
- AutoCloseable: Matches JDK Optional/Stream ergonomics
- Jackson with FAIL_ON_UNKNOWN_PROPERTIES: Catches schema drift early
- Stream-based iteration: Lazy evaluation for large PDFs with daemon thread subprocess management
- Kotlin in same artifact: No separate Kotlin SDK needed; kotlin-stdlib is optional dependency
Error Mapping
Exit codes map to specific exception types as per SDK contract:
- 0 → Success (no exception)
- 2 → CorruptPdfException
- 3 → EncryptionException
- 4 → SourceUnreachableException
- 5 → RemoteFetchInterruptedException
- 6 → TlsException
- 10 → ReceiptVerifyException
- Other → PdftractException (base)
Option Naming
CLI flags converted to camelCase per Java convention:
--ocr-language→ocrLanguage--ocr-threshold→ocrThreshold--preserve-layout→preserveLayout--extract-images→extractImages--image-format→imageFormat--min-image-size→minImageSize--case-insensitive→caseInsensitive--whole-word→wholeWord--max-results→maxResults
WARN Items
None. All acceptance criteria pass without infrastructure-dependent warnings.
Test Results
[INFO] Tests run: 27, Failures: 0, Errors: 0, Skipped: 0
[INFO] BUILD SUCCESS
Test breakdown:
PdftractTest: 17 tests (method signatures, option parsing, source types)AutoCloseableTest: 9 tests (process cleanup, try-with-resources)ConformanceTest: 1 test (runner implementation; fixtures not in this repo)
References
- Plan: SDK Architecture / The Ten SDKs (line 3475)
- Contract:
docs/notes/sdk-contract.md - Conformance suite:
tests/sdk-conformance/cases.json(in main pdftract repo) - Argo workflow:
pdftract-java-publish(in declarative-config)
Next Steps
- Publish to Maven Central via OSSRH (requires GPG key from OpenBao)
- Link conformance results in README when CI runs
- Update version to 1.0.0 for initial release