pdftract/tests
jedarden d1dc2280f1 feat(pdftract-27n3): implement border padding, pipeline orchestration, and fixtures
Implement step 5 (white-border padding: 10 px on all sides), wire all
preprocessing steps into the final preprocess(input, ImageSource) ->
GrayImage entry point, and curate fixtures for the three image-source
paths (PhysicalScan / DigitalOrigin / Jbig2).

Changes:
- Add add_border_padding() function: creates (width+20) x (height+20)
  image with 10px white border on all sides
- Add preprocess() pipeline orchestrator: applies deskew, contrast
  normalization, binarization, denoising, and padding in correct order
- Skip contrast, binarization, and denoising for JBIG2 images
- Generate test fixtures for skewed_2deg, uneven_lighting, clean_digital,
  and jbig2_scan scenarios
- Add integration tests for all critical test scenarios
- Add A4-page benchmarks targeting < 500ms for physical/digital, < 200ms
  for JBIG2

Refs:
- Plan section: Phase 5.3 step 5 (line 1878) + critical tests (lines 1882-1885)
- Bead: pdftract-27n3
- Note: notes/pdftract-27n3.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 21:55:11 -04:00
..
c-client feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance feat(pdftract-5omc): implement per-language conformance test runner pattern 2026-05-18 01:32:24 -04:00
fixtures feat(pdftract-27n3): implement border padding, pipeline orchestration, and fixtures 2026-05-23 21:55:11 -04:00
proptest fix(pdftract-2a6rk): fix xref.rs u64 literal overflow in proptest 2026-05-22 17:26:27 -04:00
proptest-regressions docs(pdftract-49f8): establish Cargo.lock policy and documentation 2026-05-20 18:13:14 -04:00
python-conformance feat(pdftract-5omc): implement SDK conformance test runner pattern 2026-05-18 01:22:23 -04:00
sdk-conformance feat(pdftract-mcp): add MCP server implementation changes 2026-05-23 03:09:56 -04:00
conformance.c feat(pdftract-1eaxm): implement libpdftract C FFI library 2026-05-23 08:55:12 -04:00
conformance_fixed feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_fixed.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test_simple feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test_simple.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_parse.rs feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_stream feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_stream.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
proptest-panic-verification.rs feat(pdftract-33v): add property tests and nightly fuzz job 2026-05-20 19:18:03 -04:00
test_api_basic feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_basic.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_null feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_null.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_real feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_real.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_valid feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_valid.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_classifier_corpus.rs test(classifier): add 200-document labeled corpus for Phase 5.6 2026-05-17 07:16:02 -04:00
test_debug feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_debug.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_parse_fixture.rs feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_simple.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_simple_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_stream feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_stream.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_valid.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_valid_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00