pdftract/docs/notes/sdk-conformance-runner.md
jedarden 9456d8e231 feat(pdftract-5omc): implement per-language conformance test runner pattern
Implements the conformance test runner pattern for all 10 SDKs as specified
in the plan (line 3547). Each SDK now has a dedicated conformance test runner.

Created:
- tests/sdk-conformance/report-schema.json: JSON schema for conformance reports
- docs/notes/sdk-conformance-runner.md: Pattern documentation and reference
- crates/pdftract-cli/tests/conformance.rs: Rust cargo test target
- tests/conformance/test_conformance.py: Python pytest harness
- tests/conformance/conformance.test.ts: Node.js vitest runner
- tests/conformance/conformance_test.go: Go go test runner
- tests/conformance/ConformanceTest.java: Java JUnit 5 runner
- tests/conformance/ConformanceTests.cs: .NET xUnit runner
- tests/conformance/conformance.c: C standalone binary
- tests/conformance/conformance_test.rb: Ruby minitest runner
- tests/conformance/ConformanceTest.php: PHP PHPUnit runner
- tests/conformance/ConformanceTests.swift: Swift XCTest runner

All runners implement:
- Loading of tests/sdk-conformance/cases.json
- Execution of test cases with language-native method invocations
- Comparison of results against expected values with numeric tolerances
- Emission of machine-readable conformance-report.json
- Non-zero exit on failures/errors for CI gating

Acceptance criteria:
- PASS: All 10 SDKs have language-specific runners
- PASS: Runners consume shared cases.json
- PASS: Runners emit JSON reports matching schema
- PASS: Runners exit non-zero on failure
- WARN: README integration pending SDK repo creation
- WARN: Stub implementations return placeholder results

References:
- Plan line 3547: "Every SDK has a pdftract-sdk-conformance test runner"
- Plan line 3589: "Conformance suite results published as Argo artifact"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-5omc
2026-05-18 01:32:24 -04:00

5.6 KiB

SDK Conformance Test Runner Pattern

This document describes the conformance test runner pattern that every SDK implements for pdftract.

Overview

The conformance test suite is the SDK API contract. Every SDK must implement a test runner that:

  1. Loads the shared tests/sdk-conformance/cases.json file
  2. Iterates through test cases
  3. Invokes the SDK's native methods with the case's options
  4. Compares the result against expected values with tolerances
  5. Reports per-case pass/fail/skip/error status
  6. Emits a machine-readable JSON summary (conformance-report.json)

Conformance Report Schema

See tests/sdk-conformance/report-schema.json for the full JSON schema.

Key fields:

  • sdk: SDK name (e.g., "pdftract-py", "pdftract-node")
  • sdk_version: SDK version that produced the report
  • suite_version: Version of the conformance suite run
  • results: Array of per-case results with id, status, actual, expected, error, reason, duration_ms
  • summary: Aggregate counts for total, passed, failed, skipped, errors
  • environment: OS, arch, binary version, runtime version

Per-Language Runners

SDK Path Test Framework CLI Command
Rust crates/pdftract-cli/tests/conformance.rs cargo test cargo test --test conformance
Python tests/conformance/test_conformance.py pytest pytest tests/conformance/test_conformance.py -v
Node.js tests/conformance/conformance.test.ts vitest vitest test/conformance/conformance.test.ts
Go tests/conformance/conformance_test.go go test go test -v ./conformance_test.go
Java tests/conformance/ConformanceTest.java JUnit 5 mvn test -Dtest=ConformanceTest
.NET tests/conformance/ConformanceTests.cs xUnit dotnet test --filter ConformanceTests
C tests/conformance/conformance.c standalone binary ./conformance [suite-path] [output-path]
Ruby tests/conformance/conformance_test.rb minitest ruby test/conformance/conformance_test.rb
PHP tests/conformance/ConformanceTest.php PHPUnit ./vendor/bin/phpunit tests/ConformanceTest.php
Swift tests/conformance/ConformanceTests.swift XCTest swift test --filter ConformanceTests

Shared Comparison Logic

All runners implement the same comparison logic with tolerances:

Numeric Comparison with Tolerance

function compare_with_tolerance(actual, expected, tolerance):
    if tolerance is null:
        return abs(actual - expected) < EPSILON

    if tolerance.abs exists:
        if abs(actual - expected) <= tolerance.abs:
            return true

    if tolerance.rel exists:
        diff = abs(actual - expected)
        avg = (actual + expected) / 2.0
        if avg > 0.0 and diff / avg <= tolerance.rel:
            return true

    return false

Wildcard Path Matching

Tolerances use JSONPath-like wildcard syntax:

  • pages[*].blocks[*].bbox matches all bbox values
  • pages[0].spans[*].confidence matches all confidence values in page 0

Expected Value Constraints

The expected object supports special constraint fields:

Field Type Description
min number Minimum numeric value
max number Maximum numeric value
value number Exact value (with tolerance)
min_length number Minimum string/array length
contains array String must contain all substrings
min number Minimum array length
max number Maximum array length

Test Case Execution Flow

  1. Load test case from suite
  2. Check min_schema_version - skip if SDK schema is too old
  3. Resolve fixture path (handle remote URLs)
  4. Execute SDK method with options
  5. Compare result against expected with tolerances
  6. Record result with timing
  7. Emit final report

Exit Codes

  • 0: All tests passed (or all failures were skips)
  • 1: One or more tests failed or errored

CI Integration

The per-SDK Argo publish workflow MUST run the conformance runner BEFORE publishing. A failed runner aborts the publish step.

Example Argo step:

- name: conformance
  template: conformance-runner
  arguments:
    parameters:
    - name: sdk
      value: pdftract-py

- name: publish
  template: publish-to-pypi
  dependencies:
  - conformance
  when: "{{steps.conformance.exitCode}}"

README Integration

Each SDK's README should have a "Conformance" section that links to the latest published report:

## Conformance

This SDK passes the official pdftract conformance suite. Latest report: [conformance-pdftract-py-0.1.0.json](https://argoproj.example/artifacts/conformance-pdftract-py-0.1.0.json)

Stub Implementation Notes

The current runners contain stub implementations for executeMethod() that return placeholder values. These must be replaced with actual SDK calls when:

  1. The SDK's native methods are implemented
  2. The binary interface is stable
  3. The JSON output schema is finalized

Until then, the runners serve as:

  • A reference implementation pattern
  • A starting point for SDK development
  • Documentation of expected behavior

Adding New Test Cases

To add a new test case to the suite:

  1. Add the case to tests/sdk-conformance/cases.json
  2. Bump version in the suite (if cases changed)
  3. Update all SDK runners to handle the new case (if needed)
  4. Verify all SDKs pass the updated suite before publishing

References

  • Plan section: SDK Architecture / The Conformance Suite, line 3547
  • Plan section: SDK Acceptance Criteria, line 3589
  • Shared suite: tests/sdk-conformance/cases.json
  • Report schema: tests/sdk-conformance/report-schema.json