pdftract/crates
jedarden 85acaa9b56 feat(pdftract-4a3je): implement multipart parsing with PDF magic-byte validation
- Add field-typing helpers (parse_bool, parse_float, parse_int, parse_comma_list)
- Add validate_pdf_magic_bytes() to check for %PDF- header
- Update ExtractParams to support: ocr_language, ocr_dpi, markdown_anchors
- Update receive_pdf() to use type-aware parsing and validate PDF bytes
- Update build_options() to map form fields to ExtractionOptions
- Add comprehensive unit tests for form helpers and build_options

Per plan section 2127-2137, implements optional form field parsing with:
- Forward-compatibility for unknown fields (warning logs, ignored)
- Clear 400 errors with hints on parse failure
- Typed coercion (bool from "true"/"1"; comma-list to Vec<String>)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:19:10 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-4a3je): implement multipart parsing with PDF magic-byte validation 2026-05-27 20:19:10 -04:00
pdftract-core feat(pdftract-1q19p): implement OCG /OC tag tracking with is_hidden flag 2026-05-26 22:25:27 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-1tswa): implement GIL release with py.allow_threads on extraction entry points 2026-05-26 21:23:00 -04:00