- Add Pdftract.swift.tera for main public API with type aliases - Update Methods.swift.tera with async throws functions and AsyncThrowingStream for streaming - Update Errors.swift.tera with 8 error types implementing LocalizedError - Update Types.swift.tera with Source enum, Options structs, and all Codable types - Update ConformanceTests.swift.tera with XCTest-based conformance suite - Update README.md.tera with full documentation (install, usage, error handling) - Update Package.swift.tera with macOS(.v13) and Linux platform support Closes pdftract-5lvpu
3.3 KiB
Verification Note: pdftract-2rc4
Summary
Verified and maintained the JSON Schema generation and migration tooling for pdftract v1.0.
Acceptance Criteria Status
PASS Criteria
-
Schema exists and validates as JSON Schema 2020-12
- File:
docs/schema/v1.0/pdftract.schema.json(73,034 bytes) - Generated from Rust types using schemars derive
- Contains all required fields: page_index, page_number, page_label, width, height, rotation, page_type
- File:
-
page_type enum includes broken_vector
$ grep -A 10 '"broken_vector"' docs/schema/v1.0/pdftract.schema.jsonConfirmed enum values: text, scanned, mixed, broken_vector, blank, figure_only
-
attachments data field carries contentEncoding: base64
$ grep -B 5 -A 5 'contentEncoding.*base64' docs/schema/v1.0/pdftract.schema.jsonConfirmed contentEncoding: base64 on AttachmentJson.data field
-
xtask validate-schema regenerates and diffs cleanly
$ cargo run --manifest-path=xtask/Cargo.toml --bin xtask validate-schema ✓ Schema is up-to-date: /home/coding/pdftract/docs/schema/v1.0/pdftract.schema.json -
Migration tool runs end-to-end
$ echo '{"schema_version": "1.0", "test": "value"}' | ./target/release/migrate-schema --from 1.0 --to 1.0 {"schema_version":"1.0","test":"value"}
WARN Criteria
None - all infrastructure components are in place and functional.
Files Modified
xtask/src/main.rs- Added missing SpanJson.confidence_source enum constraint to add_enum_constraints function
Infrastructure Components
-
Schema Generator:
xtask/src/bin/gen_schema.rs- Generates JSON Schema from Rust types
- Uses schemars crate with JSON Schema 2020-12 dialect
- Adds explicit enum constraints for stability
- Sorts keys recursively for deterministic output
-
Schema Validator:
xtask/src/main.rs::validate_schema()- Regenerates schema in memory
- Compares byte-for-byte with checked-in version
- Fails build on drift (CI gate)
-
Migration Library:
crates/pdftract-schema-migrate/src/lib.rs- MigrationRegistry with version-pair migrations
- Identity migration for v1.0 -> v1.0
- Validates migration direction (no downgrades, no major version changes)
-
Migration CLI:
crates/pdftract-schema-migrate/src/bin/migrate-schema.rs- CLI tool for running migrations
- Supports stdin/stdout and file I/O
- Auto-detects pretty-printing for terminals
-
Validation Tests:
tests/schema/validate_fixtures.rs- Validates fixture outputs against schema
- Generates expected.json on first run
- Tests individual fixtures and full suite
Commands
- Generate schema:
cargo run --manifest-path=xtask/Cargo.toml --bin gen_schema - Validate schema:
cargo run --manifest-path=xtask/Cargo.toml --bin xtask validate-schema - Run migration:
./target/release/migrate-schema --from 1.0 --to 1.0 input.json -o output.json
Related Plan Sections
- Lines 97 (schema as source of truth)
- Lines 823 (INV-11 schema validation gate)
- Lines 986 (Anti-Pattern: serde_json::Value)
- Lines 1836 (broken_vector enum requirement)
- Lines 2002-2030 (Phase 6.1 schema deliverable)
- Lines 2640 (attachments base64 encoding)
- Lines 3230/3250 (INV-11 gates in checklists)
Verification Date
2026-06-01