pdftract/crates/pdftract-core/src
jedarden 61b94b49d2 feat(pdftract-6dki1): implement histogram stretch contrast normalization
Implement Phase 5.3.2a: histogram-based contrast normalization for OCR
preprocessing. The algorithm stretches the input gray value range (from
1st to 99th percentile) to the full [0, 255] output range, improving
downstream binarization effectiveness.

Key implementation details:
- 256-bin histogram computation for percentile calculation
- 1st/99th percentile robustness against hot pixels and artifacts
- In-place mutation for performance (no double allocation)
- Proper error handling for uniform images and invalid dimensions
- Overflow-safe arithmetic using i32 intermediate values

Acceptance criteria:
- Image with [50, 200] range → stretched to [0, 255]
- Hot pixel robustness: single 0/255 pixels handled correctly
- Uniform image → early return with UniformImage error
- Invalid dimensions (zero width/height) → InvalidDimensions error
- Full performance: < 50 ms for 8 MP images

Closes: pdftract-6dki1
2026-05-24 10:30:20 -04:00
..
attachment feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
cache feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
fingerprint feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
font feat(pdftract-2iur): implement nearest-neighbor scanner with Hamming distance and frequency tie-break 2026-05-24 06:57:27 -04:00
forms feat(pdftract-2qum): implement FormFieldValue enum and XFA-wins combiner 2026-05-24 10:11:47 -04:00
layout feat(pdftract-8n270): implement code block detection 2026-05-24 10:04:22 -04:00
ocr/preprocessing feat(pdftract-6dki1): implement histogram stretch contrast normalization 2026-05-24 10:30:20 -04:00
parser feat(pdftract-1bv81): implement ASCII85Decode filter per PDF spec 7.4.3 2026-05-24 09:10:03 -04:00
profiles feat(pdftract-2iyk): implement classifier engine 2026-05-24 10:23:58 -04:00
receipts feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
render feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
schema feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
signature feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
table feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
classify.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
content_stream.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
diagnostics.rs feat(pdftract-1bv81): implement ASCII85Decode filter per PDF spec 7.4.3 2026-05-24 09:10:03 -04:00
document.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
dpi.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
extract.rs feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator 2026-05-24 07:35:03 -04:00
graphics_state.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
hybrid.rs feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
lib.rs feat(pdftract-6dki1): implement histogram stretch contrast normalization 2026-05-24 10:30:20 -04:00
markdown.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
ocr.rs feat(pdftract-6dki1): implement histogram stretch contrast normalization 2026-05-24 10:30:20 -04:00
options.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
preprocess.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
render.rs feat(pdftract-axcri): record inline images as ImageXObject entries 2026-05-24 07:41:50 -04:00
semaphore.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
span_flags.rs feat(pdftract-cbrbg): implement span flag detector for Phase 4.1 2026-05-24 07:28:25 -04:00
url_validation.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
word_boundary.rs feat(pdftract-h2s0z): implement adaptive word boundary detector 2026-05-24 06:06:56 -04:00