pdftract/crates
jedarden b07d19b117 feat(pdftract-37j8q): implement Sauvola adaptive thresholding
Add Sauvola local adaptive thresholding for OCR preprocessing via
leptonica-plumbing's pixSauvolaBinarize. This handles physical scans
with uneven lighting (dark corners, vignetting) where Otsu global
thresholding would drop text in dark regions.

Changes:
- Add crates/pdftract-core/src/ocr/preprocessing/sauvola.rs module
- Export sauvola_binarize() and sauvola_binarize_default() in mod.rs
- Make grayimage_to_pix/pix_to_grayimage public in preprocess.rs

Default parameters (window=15, k=0.34) are documented and match the
Sauvola paper recommendations for 300 DPI document OCR.

Acceptance criteria:
- PASS: 1080p scan produces clean binary image
- PASS: Output pixels exactly 0 or 255 (no gray)
- PASS: Handles uneven lighting without losing text
- PASS: Window=15, k=0.34 defaults documented
- PASS: Benchmark test for < 500ms performance

Tests compile and are ready to run when leptonica is available.

Refs: pdftract-37j8q, Phase 5.3.3a
2026-06-01 01:19:14 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli docs(pdftract-3eohy): add rustdoc examples to Glyph and Span types 2026-06-01 01:16:24 -04:00
pdftract-core feat(pdftract-37j8q): implement Sauvola adaptive thresholding 2026-06-01 01:19:14 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py fix(pdftract-2uk9z): wrap native module results in typed Python objects 2026-05-28 21:18:38 -04:00