pdftract/profiles/builtin/classification/invoice.yaml
jedarden 71705ed77b feat(profiles): implement built-in classification profiles (5.6.4)
Add 9 built-in classification profile definitions as YAML files bundled
via include_str! for the document type classifier (Phase 5.6).

- Create profiles/builtin/classification/{invoice,receipt,contract,scientific_paper,slide_deck,form,bank_statement,legal_filing,book_chapter}.yaml
- Implement load_builtins() in profiles module with profiles feature gate
- Each profile uses MatchPredicate schema with text patterns, structural signals, page counts
- Add comprehensive unit tests for profile loading and feature gate

Closes: pdftract-5sdd

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 15:04:43 -04:00

42 lines
727 B
YAML

name: Standard Invoice
type: invoice
threshold: 0.6
predicates:
- kind: text_contains
pattern: invoice
weight: 0.3
case_sensitive: false
min_hits: 1
- kind: text_contains
pattern: total
weight: 0.2
case_sensitive: false
min_hits: 1
- kind: text_contains
pattern: subtotal
weight: 0.15
case_sensitive: false
min_hits: 1
- kind: structural_has_table
weight: 0.15
min_count: 1
- kind: page_count_in_range
min: 1
max: 5
weight: 0.1
- kind: text_contains
pattern: due date
weight: 0.05
case_sensitive: false
min_hits: 1
- kind: text_contains
pattern: payment terms
weight: 0.05
case_sensitive: false
min_hits: 1