docs(pdftract-3a310): add coordinator verification note
Document status: coordinator cannot close because pdftract-1lp2 (Profile Authoring epic) is open. Missing for epic completion: - Fixtures: bank_statement (0/5), contract (0/5), form (0/5), receipt (2/5) - expected-output.json: 0/9 - Regression tests: 0/9
This commit is contained in:
parent
80dbf0f703
commit
897f6edb31
1 changed files with 99 additions and 0 deletions
99
notes/pdftract-3a310.md
Normal file
99
notes/pdftract-3a310.md
Normal file
|
|
@ -0,0 +1,99 @@
|
|||
# Phase 7.10 Coordinator Verification Note
|
||||
|
||||
**Bead ID:** pdftract-3a310
|
||||
**Date:** 2026-05-31
|
||||
**Commit:** 80dbf0f (feat(profiles): add profile infrastructure and initial fixtures)
|
||||
|
||||
## Status: CANNOT CLOSE - Dependent epic incomplete
|
||||
|
||||
The coordinator `pdftract-3a310` cannot be closed because its dependent epic `pdftract-1lp2` (Profile Authoring) is still **open**.
|
||||
|
||||
## Dependency Chain
|
||||
|
||||
```
|
||||
pdftract-3a310 (Phase 7.10 coordinator)
|
||||
├── pdftract-3zhf (Phase 7.2 coordinator) - CLOSED ✓
|
||||
├── pdftract-2mw6 (Phase 7.4 coordinator) - CLOSED ✓
|
||||
└── pdftract-1lp2 (Profile Authoring epic) - OPEN ✗
|
||||
```
|
||||
|
||||
## What Was Completed (This Session)
|
||||
|
||||
### Profile Infrastructure Code
|
||||
Committed in 80dbf0f:
|
||||
- `crates/pdftract-core/src/profiles/apply_profile.rs` - Profile application logic
|
||||
- `crates/pdftract-core/src/profiles/extraction.rs` - Extraction override handling
|
||||
- `crates/pdftract-core/src/profiles/extraction_loader.rs` - Extraction option deserialization
|
||||
- `crates/pdftract-core/src/profiles/field_extractor.rs` - Field DSL evaluator
|
||||
- `crates/pdftract-core/src/profiles/match_eval.rs` - Match DSL evaluator
|
||||
- `crates/pdftract-cli/src/profiles_cmd.rs` - profiles subcommand implementation
|
||||
- Updated `crates/pdftract-core/src/profiles/mod.rs` - Module exports
|
||||
|
||||
### Built-in Profile YAMLs (9/9 complete)
|
||||
All 9 profiles exist at `profiles/builtin/<name>/profile.yaml`:
|
||||
- invoice, receipt, contract, scientific_paper, slide_deck
|
||||
- form, bank_statement, legal_filing, book_chapter
|
||||
|
||||
### Profile READMEs (9/9 complete)
|
||||
All 9 profiles have README.md at `profiles/builtin/<name>/README.md`
|
||||
|
||||
### Classifier Corpus (exists)
|
||||
`tests/fixtures/classifier/` contains:
|
||||
- contract, invoice, misc, scientific_paper directories
|
||||
- MANIFEST.tsv
|
||||
- README.md
|
||||
|
||||
### Fixtures Added (partial)
|
||||
- invoice: 50 PDF fixtures ✓
|
||||
- receipt: 2 PDF fixtures (needs 3 more)
|
||||
|
||||
## What Remains for `pdftract-1lp2` (Profile Authoring Epic)
|
||||
|
||||
### Missing Fixtures (per acceptance criteria: >= 5 per profile)
|
||||
- bank_statement: 0/5 fixtures
|
||||
- contract: 0/5 fixtures
|
||||
- form: 0/5 fixtures
|
||||
- receipt: 2/5 fixtures (needs 3 more)
|
||||
|
||||
### Missing Expected Output Files (0/9)
|
||||
- `tests/fixtures/profiles/<name>/expected-output.json` does not exist for any profile
|
||||
- These files contain the canonical `metadata.profile_fields` expected values for each fixture
|
||||
|
||||
### Missing Regression Tests (0/9)
|
||||
- `tests/profiles/test_<name>.rs` does not exist for any profile
|
||||
- Should run each fixture through `extract --profile <name>` and assert against expected-output.json
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
For `pdftract-3a310` coordinator:
|
||||
|
||||
| Criterion | Status |
|
||||
|-----------|--------|
|
||||
| All Phase 7.10 child task beads closed | ❌ BLOCKED - `pdftract-1lp2` is open |
|
||||
| Acrobat sample invoice classified > 0.8 confidence | ⚠️ NOT TESTED - needs classifier corpus run |
|
||||
| Invoice field extraction >= 90% accuracy | ⚠️ NOT TESTED - needs expected-output.json + regression test |
|
||||
| Custom profile with priority 100 overrides built-ins | ⚠️ NOT TESTED |
|
||||
| Malformed regex profile rejected by validate | ⚠️ NOT TESTED |
|
||||
| profile_fields.total: null when not found | ⚠️ NOT TESTED |
|
||||
| Hot-reload picks up new YAML on next request | ⚠️ NOT TESTED |
|
||||
| User profile shadowing shown in list | ⚠️ NOT TESTED |
|
||||
| Built-in invoice profile >= 90% field accuracy | ⚠️ NOT TESTED |
|
||||
| Field extraction adds < 5% to per-document time | ⚠️ NOT TESTED |
|
||||
| 9 built-in profiles ship with >= 5 fixtures each | ❌ FAIL - bank_statement, contract, form have 0; receipt has 2 |
|
||||
| Built-in profile YAML compiled via include_str! | ⚠️ NOT VERIFIED |
|
||||
|
||||
## Next Steps
|
||||
|
||||
To close `pdftract-3a310`, first close `pdftract-1lp2` (Profile Authoring epic):
|
||||
|
||||
1. Add missing fixtures (15 total: bank_statement 5, contract 5, form 5, receipt 3)
|
||||
2. Generate expected-output.json for each profile's fixtures
|
||||
3. Write regression tests at `tests/profiles/test_<name>.rs`
|
||||
4. Run classifier corpus validation to verify >= 90% accuracy
|
||||
5. Verify all acceptance criteria
|
||||
|
||||
## References
|
||||
|
||||
- Plan section: Phase 7.10 Document Profiles (lines 2890-3070)
|
||||
- `pdftract-1lp2` (Profile Authoring epic) - must be closed first
|
||||
- PROVENANCE.md at tests/fixtures/profiles/PROVENANCE.md (50KB, validates fixture sources)
|
||||
Loading…
Add table
Reference in a new issue