- Fixed missing fields in BlockJson, SpanJson, ExtractionOptions initializations - Added feature gates to ocr_integration tests for conditional compilation - Fixed McpServerState::new calls to include audit writer argument - Fixed CCITTFaxDecoder::decode calls to use instance method - Fixed type casts for ObjRef::new calls - Fixed serde_json::Value method calls (is_some -> !is_null) - Fixed ProfileType test feature gates - Worked around lifetime issues in schema roundtrip tests These changes fix numerous compilation errors that were blocking the codebase from building. The main library and tests now compile successfully. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 KiB
2 KiB
Form Profile Fixtures
This directory contains test fixtures for the form document profile.
Fixture Types
- irs_1040.pdf (2 pages) - IRS Form 1040 U.S. Individual Income Tax Return with standard tax form fields, signature section, and form-based layout
- w2.pdf (1-2 pages) - W-2 Wage and Tax Statement with employee/employer info, wage fields, and tax boxes
- i9.pdf (1-3 pages) - Form I-9 Employment Eligibility Verification with employee attestation section and employer review
- expense_report.pdf (1-2 pages) - Simple expense report with itemized expenses, total calculation, and approval signature
- intake_form.pdf (2-5 pages) - Multi-page new client intake form with personal information, service selection, and consent sections
Expected Output Format
Each fixture should have a corresponding *-expected.json file with the following structure:
{
"metadata": {
"document_type": "form",
"document_type_confidence": 0.XX,
"document_type_reasons": [...],
"profile_name": "form",
"profile_version": "1.0.0",
"profile_fields": {}
}
}
Important Notes
The form profile is degenerate - it has NO field extractors (profile_fields: {}). The form profile:
- Uses
reading_order: line_dominantfor text extraction - Surfaces
form_fieldsfrom Phase 7.4 (AcroForm field extraction) separately in the extraction output - Does NOT extract any profile-specific fields
The expected JSON files reflect this degenerate behavior - profile_fields is always an empty object {}.
Provenance
All fixtures should be sourced from publicly available form templates or created synthetically with clear provenance documentation. No real forms with PII or confidential information.
TODO
- Create irs_1040.pdf and irs_1040-expected.json
- Create w2.pdf and w2-expected.json
- Create i9.pdf and i9-expected.json
- Create expense_report.pdf and expense_report-expected.json
- Create intake_form.pdf and intake_form-expected.json