pdftract/benches/competitors/run-pypdf.sh
jedarden 857f928732 feat(pdftract-5omc): implement SDK conformance test runner pattern
Implement the conformance test runner pattern that every SDK will
implement to validate against the shared test suite.

- Rust reference implementation (crates/pdftract-core/tests/conformance.rs)
  * Full test suite loader and executor
  * Comparison engine with min/max, string constraints, tolerances
  * Skip logic for unsupported features and schema versions
  * Report generation in JSON format

- CLI compare subcommand (crates/pdftract-cli/src/main.rs)
  * pdftract compare - Compare actual vs expected with tolerances
  * Cross-language comparison tool to avoid reimplementations

- Documentation (docs/conformance/sdk-contract.md)
  * Complete pattern specification with pseudocode
  * Per-language runner locations
  * CI integration requirements

- Python reference stub (tests/python-conformance/test_conformance.py)
  * Full pytest-based implementation following the pattern

Closes: pdftract-5omc
2026-05-18 01:22:23 -04:00

27 lines
535 B
Bash
Executable file

#!/bin/bash
# Wrapper for pypdf text extraction
# Usage: run-pypdf.sh <pdf-file>
set -euo pipefail
PDF_FILE="$1"
if [ ! -f "$PDF_FILE" ]; then
echo "ERROR: File not found: $PDF_FILE" >&2
exit 1
fi
# Run pypdf text extraction
python3 -c "
import sys
from pypdf import PdfReader
try:
reader = PdfReader('$PDF_FILE')
text = ''
for page in reader.pages:
text += page.extract_text() + '\n'
sys.stdout.write(text)
except Exception as e:
sys.stderr.write(f'ERROR: {e}\n')
sys.exit(1)
" > /dev/null