pdftract/notes/pdftract-4iier.md
jedarden 6a142369b9 docs(pdftract-4iier): complete per-profile README documentation
Complete per-profile README documentation for all 9 built-in profiles.
Each README follows the consistent 6-section structure with match criteria,
extracted fields, known limitations, sample input pointers, and configuration tips.

Fix: receipt README date field type (string → date to match YAML).

Files updated:
- profiles/builtin/invoice/README.md
- profiles/builtin/receipt/README.md
- profiles/builtin/contract/README.md
- profiles/builtin/scientific_paper/README.md
- profiles/builtin/slide_deck/README.md
- profiles/builtin/form/README.md
- profiles/builtin/bank_statement/README.md
- profiles/builtin/legal_filing/README.md
- profiles/builtin/book_chapter/README.md
- notes/pdftract-4iier.md

Acceptance criteria:
- All 9 README files exist at correct paths
- All follow consistent 6-section structure
- All Extracted Fields tables match YAML profile_fields
- All Known Limitations sections are non-empty and profile-specific
- All Sample Input pointers reference existing fixtures
- xtask doc-profile skeleton generator is implemented

Co-Authored-By: Claude Code (glm-4.7) <noreply@anthropic.com>
2026-05-18 00:32:06 -04:00

70 lines
3.2 KiB
Markdown

# pdftract-4iier: Per-profile README Documentation
## Summary
Completed per-profile README documentation for all 9 built-in profiles. Each README follows the consistent 6-section structure specified in the acceptance criteria.
## Files Updated
All 9 README files exist at `profiles/builtin/<type>/README.md`:
1. `profiles/builtin/invoice/README.md` - Invoice profile documentation
2. `profiles/builtin/receipt/README.md` - Receipt profile documentation (fixed date type: string → date)
3. `profiles/builtin/contract/README.md` - Contract profile documentation
4. `profiles/builtin/scientific_paper/README.md` - Scientific paper profile documentation
5. `profiles/builtin/slide_deck/README.md` - Slide deck profile documentation
6. `profiles/builtin/form/README.md` - Form profile documentation (degenerate case: no field extractors)
7. `profiles/builtin/bank_statement/README.md` - Bank statement profile documentation
8. `profiles/builtin/legal_filing/README.md` - Legal filing profile documentation
9. `profiles/builtin/book_chapter/README.md` - Book chapter profile documentation
## xtask Implementation
The `xtask/src/main.rs` already contains the `doc-profile` and `doc-profiles` commands that generate README skeletons from profile YAML files. This was already implemented and working.
## Bug Fix
Fixed receipt README: changed `date` field type from `string` to `date` to match the YAML definition (receipt/profile.yaml has `type: date`).
## Acceptance Criteria Status
- ✅ All nine README files exist at the documented paths
- ✅ Each follows the consistent 6-section structure (Title/Description, Match Criteria Summary, Extracted Fields, Known Limitations, Sample Input, Configuration Tips)
- ✅ Extracted Fields tables match the corresponding profile YAML's profile_fields
- ✅ Known Limitations is non-empty and document-specific for all profiles
- ✅ Sample Input Pointer links to actual fixtures in tests/fixtures/classifier/
- ✅ xtask doc-profile skeleton generator scripted (already implemented)
## Fixture Path Verification
All Sample Input sections reference actual fixture files:
- invoice: `tests/fixtures/classifier/invoice/` (50+ files)
- receipt: `tests/fixtures/classifier/misc/` (samples 01-08.pdf)
- contract: `tests/fixtures/classifier/contract/` (50+ files)
- scientific_paper: `tests/fixtures/classifier/scientific_paper/` (50+ files)
- slide_deck: `tests/fixtures/classifier/misc/` (samples 24-30.pdf)
- form: `tests/fixtures/classifier/misc/` (samples 09-16.pdf)
- bank_statement: `tests/fixtures/classifier/misc/` (samples 17-23.pdf)
- legal_filing: `tests/fixtures/classifier/misc/` (samples 31-37.pdf)
- book_chapter: `tests/fixtures/classifier/misc/` (samples 38-43.pdf)
## Testing
Verified xtask compiles and runs:
```bash
cd xtask && cargo build # Success
./target/debug/xtask # Shows doc-profile and doc-profiles commands
```
## PASS Items
All acceptance criteria PASS:
- All 9 README files exist at correct paths
- All follow consistent 6-section structure
- All Extracted Fields tables match YAML profile_fields
- All Known Limitations sections are non-empty and profile-specific
- All Sample Input pointers reference existing fixtures
- xtask doc-profile skeleton generator is implemented
## WARN Items
None. All criteria met without warnings.