pdftract/notes/pdftract-2rc4.md
jedarden dd2cb0b8c9 feat(pdftract-5lvpu): implement Swift SDK subprocess templates
- Add Pdftract.swift.tera for main public API with type aliases
- Update Methods.swift.tera with async throws functions and AsyncThrowingStream for streaming
- Update Errors.swift.tera with 8 error types implementing LocalizedError
- Update Types.swift.tera with Source enum, Options structs, and all Codable types
- Update ConformanceTests.swift.tera with XCTest-based conformance suite
- Update README.md.tera with full documentation (install, usage, error handling)
- Update Package.swift.tera with macOS(.v13) and Linux platform support

Closes pdftract-5lvpu
2026-06-01 10:47:20 -04:00

3.3 KiB

Verification Note: pdftract-2rc4

Summary

Verified and maintained the JSON Schema generation and migration tooling for pdftract v1.0.

Acceptance Criteria Status

PASS Criteria

  1. Schema exists and validates as JSON Schema 2020-12

    • File: docs/schema/v1.0/pdftract.schema.json (73,034 bytes)
    • Generated from Rust types using schemars derive
    • Contains all required fields: page_index, page_number, page_label, width, height, rotation, page_type
  2. page_type enum includes broken_vector

    $ grep -A 10 '"broken_vector"' docs/schema/v1.0/pdftract.schema.json
    

    Confirmed enum values: text, scanned, mixed, broken_vector, blank, figure_only

  3. attachments data field carries contentEncoding: base64

    $ grep -B 5 -A 5 'contentEncoding.*base64' docs/schema/v1.0/pdftract.schema.json
    

    Confirmed contentEncoding: base64 on AttachmentJson.data field

  4. xtask validate-schema regenerates and diffs cleanly

    $ cargo run --manifest-path=xtask/Cargo.toml --bin xtask validate-schema
    ✓ Schema is up-to-date: /home/coding/pdftract/docs/schema/v1.0/pdftract.schema.json
    
  5. Migration tool runs end-to-end

    $ echo '{"schema_version": "1.0", "test": "value"}' | ./target/release/migrate-schema --from 1.0 --to 1.0
    {"schema_version":"1.0","test":"value"}
    

WARN Criteria

None - all infrastructure components are in place and functional.

Files Modified

  • xtask/src/main.rs - Added missing SpanJson.confidence_source enum constraint to add_enum_constraints function

Infrastructure Components

  1. Schema Generator: xtask/src/bin/gen_schema.rs

    • Generates JSON Schema from Rust types
    • Uses schemars crate with JSON Schema 2020-12 dialect
    • Adds explicit enum constraints for stability
    • Sorts keys recursively for deterministic output
  2. Schema Validator: xtask/src/main.rs::validate_schema()

    • Regenerates schema in memory
    • Compares byte-for-byte with checked-in version
    • Fails build on drift (CI gate)
  3. Migration Library: crates/pdftract-schema-migrate/src/lib.rs

    • MigrationRegistry with version-pair migrations
    • Identity migration for v1.0 -> v1.0
    • Validates migration direction (no downgrades, no major version changes)
  4. Migration CLI: crates/pdftract-schema-migrate/src/bin/migrate-schema.rs

    • CLI tool for running migrations
    • Supports stdin/stdout and file I/O
    • Auto-detects pretty-printing for terminals
  5. Validation Tests: tests/schema/validate_fixtures.rs

    • Validates fixture outputs against schema
    • Generates expected.json on first run
    • Tests individual fixtures and full suite

Commands

  • Generate schema: cargo run --manifest-path=xtask/Cargo.toml --bin gen_schema
  • Validate schema: cargo run --manifest-path=xtask/Cargo.toml --bin xtask validate-schema
  • Run migration: ./target/release/migrate-schema --from 1.0 --to 1.0 input.json -o output.json
  • Lines 97 (schema as source of truth)
  • Lines 823 (INV-11 schema validation gate)
  • Lines 986 (Anti-Pattern: serde_json::Value)
  • Lines 1836 (broken_vector enum requirement)
  • Lines 2002-2030 (Phase 6.1 schema deliverable)
  • Lines 2640 (attachments base64 encoding)
  • Lines 3230/3250 (INV-11 gates in checklists)

Verification Date

2026-06-01