Implement the conformance test runner pattern that every SDK will implement to validate against the shared test suite. - Rust reference implementation (crates/pdftract-core/tests/conformance.rs) * Full test suite loader and executor * Comparison engine with min/max, string constraints, tolerances * Skip logic for unsupported features and schema versions * Report generation in JSON format - CLI compare subcommand (crates/pdftract-cli/src/main.rs) * pdftract compare - Compare actual vs expected with tolerances * Cross-language comparison tool to avoid reimplementations - Documentation (docs/conformance/sdk-contract.md) * Complete pattern specification with pseudocode * Per-language runner locations * CI integration requirements - Python reference stub (tests/python-conformance/test_conformance.py) * Full pytest-based implementation following the pattern Closes: pdftract-5omc
27 lines
617 B
Bash
Executable file
27 lines
617 B
Bash
Executable file
#!/bin/bash
|
|
# Wrapper for pdfminer.six text extraction
|
|
# Usage: run-pdfminer.sh <pdf-file>
|
|
set -euo pipefail
|
|
|
|
PDF_FILE="$1"
|
|
|
|
if [ ! -f "$PDF_FILE" ]; then
|
|
echo "ERROR: File not found: $PDF_FILE" >&2
|
|
exit 1
|
|
fi
|
|
|
|
# Run pdfminer.six high-level text extraction
|
|
# -t: text extraction mode
|
|
# -o: output to stdout (default)
|
|
python3 -c "
|
|
import sys
|
|
from pdfminer.high_level import extract_text
|
|
|
|
try:
|
|
text = extract_text('$PDF_FILE')
|
|
# Write to stdout to ensure we process the full extraction
|
|
sys.stdout.write(text)
|
|
except Exception as e:
|
|
sys.stderr.write(f'ERROR: {e}\n')
|
|
sys.exit(1)
|
|
" > /dev/null
|