pdftract/crates/pdftract-core/src
jedarden 8b63217dbf feat(pdftract-260a3): implement legal_filing profile with fixtures and tests
Implements the legal_filing document profile for court filings (motions,
briefs, orders, docket entries) with:

- Profile YAML at profiles/builtin/legal_filing/profile.yaml
  - Fields: case_number, court, parties, filing_date, docket_entries
  - Match predicates for court name, case numbers, party markers
  - Extraction: xy_cut reading order, include_headers_footers=true

- 5 synthetic PDF fixtures at tests/fixtures/profiles/legal_filing/
  - federal_complaint: Federal district court complaint
  - state_motion: State superior court motion to dismiss
  - appellate_brief: Federal appellate brief
  - court_order: Federal district court order
  - docket_sheet: Docket sheet with entries

- 5 expected output JSON files with profile_fields

- Regression tests at crates/pdftract-cli/tests/test_legal_filing.rs
  - 14/14 tests pass
  - Verifies profile schema, fixture structure, match predicates

Acceptance criteria (from bead pdftract-260a3):
-  profiles/builtin/legal_filing.yaml validates
-  5+ public-domain fixtures with expected outputs
-  tests/test_legal_filing.rs passes
-  Per-field accuracy thresholds defined (integration tests pending Phase 7.10)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:44:49 -04:00
..
annotation feat(pdftract-4hle): implement 7.6.4 links + annotations JSON output 2026-05-25 07:44:12 -04:00
attachment feat(pdftract-3j2u): implement 50 MB size limit + base64 encoding for attachments 2026-05-25 11:42:28 -04:00
cache feat(pdftract-2okbq): implement TH-10 cache poisoning protection 2026-05-26 21:09:54 -04:00
encryption feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep 2026-05-26 20:15:39 -04:00
fingerprint feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
font feat(pdftract-2iur): implement nearest-neighbor scanner with Hamming distance and frequency tie-break 2026-05-24 06:57:27 -04:00
forms feat(pdftract-5qca): implement form_fields JSON output + schema integration 2026-05-24 14:36:03 -04:00
glyph feat(pdftract-1q19p): implement OCG /OC tag tracking with is_hidden flag 2026-05-26 22:25:27 -04:00
layout feat(pdftract-4md5z): implement XY-cut recursive reading order algorithm 2026-05-26 18:37:31 -04:00
ocr/preprocessing feat(pdftract-3h9xo): implement threads JSON output + schema integration 2026-05-25 13:40:15 -04:00
output fix(pdftract-31bum): implement smarter backpressure for OutOfOrderBuffer 2026-05-26 17:15:06 -04:00
parser feat(pdftract-260a3): implement legal_filing profile with fixtures and tests 2026-05-27 21:44:49 -04:00
profiles feat(pdftract-64p5): implement classify CLI subcommand and --auto flag 2026-05-24 15:16:56 -04:00
receipts feat(pdftract-4yspv): implement OCR receipt fallback 2026-05-25 19:53:42 -04:00
render feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
schema test(pdftract-4c8qu): add page_label tests and fix JSON schema 2026-05-25 14:43:31 -04:00
signature feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
table feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
threads feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
atomic_file_writer.rs feat(pdftract-68wfa): implement AtomicFileWriter for atomic file writes 2026-05-24 13:02:37 -04:00
audit.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
classify.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
confidence.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
content_stream.rs feat(pdftract-1q19p): implement OCG /OC tag tracking with is_hidden flag 2026-05-26 22:25:27 -04:00
diagnostics.rs feat(pdftract-2okbq): implement TH-10 cache poisoning protection 2026-05-26 21:09:54 -04:00
document.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
dpi.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
extract.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
graphics_state.rs feat(pdftract-1kdzu): implement TJ operator with kerning and word boundary detection 2026-05-26 16:44:05 -04:00
hybrid.rs feat(pdftract-5qj50): implement mojibake detection and repair via encoding_rs 2026-05-24 17:01:53 -04:00
javascript.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
lib.rs feat(pdftract-4j0ub): implement Glyph struct and emit_glyph function 2026-05-26 17:55:12 -04:00
markdown.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
ocr.rs feat(pdftract-6dki1): implement histogram stretch contrast normalization 2026-05-24 10:30:20 -04:00
options.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
page_class.rs fix(pdftract-tuky): fix color clamping test and verify Phase 3.1 coordinator 2026-05-26 16:36:01 -04:00
preprocess.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
render.rs feat(pdftract-axcri): record inline images as ImageXObject entries 2026-05-24 07:41:50 -04:00
semaphore.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
span_flags.rs feat(pdftract-cbrbg): implement span flag detector for Phase 4.1 2026-05-24 07:28:25 -04:00
text.rs feat(pdftract-529te): implement per-page block serializer 2026-05-25 12:21:07 -04:00
url_validation.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
word_boundary.rs feat(pdftract-h2s0z): implement adaptive word boundary detector 2026-05-24 06:06:56 -04:00