This commit creates user-facing documentation for each built-in profile: - Profile YAML files defining match criteria, priority, and extracted fields - Per-profile READMEs with match criteria summary, extracted fields table, known limitations, sample input pointers, and configuration tips - xtask skeleton generator for automated README generation Profiles documented: - invoice: Commercial invoices with line items, vendor/customer, totals - receipt: POS receipts with items, payment method - contract: Legal contracts with parties, effective date, term, signatures - scientific_paper: Academic papers with title, authors, abstract, DOI, references - slide_deck: Presentation slides with title, presenter, date, slide titles - form: Fillable forms (degenerate case: uses Phase 7.4 form_fields) - bank_statement: Bank statements with account info, period, balances, transactions - legal_filing: Court filings with case number, court, parties, filing date, docket - book_chapter: Book chapters with title, chapter number, author, section headings Closes: pdftract-4iier Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
18 lines
550 B
YAML
18 lines
550 B
YAML
description: Fillable form with fields; uses line_dominant reading order and form_fields from Phase 7.4
|
|
priority: 30
|
|
match:
|
|
any:
|
|
- text_patterns:
|
|
- "(?i)form\\s*[0-9A-Z-]+"
|
|
- "(?i)application\\s+form"
|
|
- "(?i)questionnaire"
|
|
- "(?i)please\\s+fill\\s+out"
|
|
- "(?i)required\\s+fields?"
|
|
- structural:
|
|
- has_form_field_layout: true
|
|
- has_blank_lines_with_colons: true
|
|
page_count_hint: 1-10
|
|
profile_fields: {}
|
|
reading_order: line_dominant
|
|
zone_filtering: none
|
|
form_fields_integration: true
|