Add 9 built-in classification profile definitions as YAML files bundled
via include_str! for the document type classifier (Phase 5.6).
- Create profiles/builtin/classification/{invoice,receipt,contract,scientific_paper,slide_deck,form,bank_statement,legal_filing,book_chapter}.yaml
- Implement load_builtins() in profiles module with profiles feature gate
- Each profile uses MatchPredicate schema with text patterns, structural signals, page counts
- Add comprehensive unit tests for profile loading and feature gate
Closes: pdftract-5sdd
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
32 lines
533 B
YAML
32 lines
533 B
YAML
name: Book Chapter
|
|
type: book_chapter
|
|
threshold: 0.6
|
|
predicates:
|
|
- kind: page_count_in_range
|
|
min: 20
|
|
max: 200
|
|
weight: 0.3
|
|
|
|
- kind: heading_depth_at_least
|
|
depth: 1
|
|
weight: 0.2
|
|
|
|
- kind: font_diversity_in_range
|
|
min: 1
|
|
max: 3
|
|
weight: 0.15
|
|
|
|
- kind: text_contains
|
|
pattern: chapter
|
|
weight: 0.15
|
|
case_sensitive: false
|
|
min_hits: 1
|
|
|
|
- kind: text_contains
|
|
pattern: chapter
|
|
weight: 0.1
|
|
case_sensitive: false
|
|
min_hits: 1
|
|
|
|
- kind: has_footer_page_numbers
|
|
weight: 0.1
|