Implement the conformance test runner pattern that every SDK will implement to validate against the shared test suite. - Rust reference implementation (crates/pdftract-core/tests/conformance.rs) * Full test suite loader and executor * Comparison engine with min/max, string constraints, tolerances * Skip logic for unsupported features and schema versions * Report generation in JSON format - CLI compare subcommand (crates/pdftract-cli/src/main.rs) * pdftract compare - Compare actual vs expected with tolerances * Cross-language comparison tool to avoid reimplementations - Documentation (docs/conformance/sdk-contract.md) * Complete pattern specification with pseudocode * Per-language runner locations * CI integration requirements - Python reference stub (tests/python-conformance/test_conformance.py) * Full pytest-based implementation following the pattern Closes: pdftract-5omc
28 lines
603 B
Bash
Executable file
28 lines
603 B
Bash
Executable file
#!/bin/bash
|
|
# Wrapper for pdfplumber text extraction
|
|
# Usage: run-pdfplumber.sh <pdf-file>
|
|
set -euo pipefail
|
|
|
|
PDF_FILE="$1"
|
|
|
|
if [ ! -f "$PDF_FILE" ]; then
|
|
echo "ERROR: File not found: $PDF_FILE" >&2
|
|
exit 1
|
|
fi
|
|
|
|
# Run pdfplumber text extraction
|
|
python3 -c "
|
|
import sys
|
|
|
|
try:
|
|
import pdfplumber
|
|
with pdfplumber.open('$PDF_FILE') as pdf:
|
|
text = ''
|
|
for page in pdf.pages:
|
|
page_text = page.extract_text() or ''
|
|
text += page_text + '\n'
|
|
sys.stdout.write(text)
|
|
except Exception as e:
|
|
sys.stderr.write(f'ERROR: {e}\n')
|
|
sys.exit(1)
|
|
" > /dev/null
|