Phase 7.4.5 implementation: Wire combined Vec<(String, FormFieldValue)> from
combiner into document-level /form_fields JSON output with tagged union schema.
- Add FormFieldJson, FormFieldTypeJson, FormFieldValueJson, ChoiceValueJson to schema
- Add form_fields: Vec<FormFieldJson> to ExtractionResult (always emitted, empty when none)
- Implement acro_field_to_value() converter for Phase 7.4.2 type-specific extraction
- Wire form field extraction in extract_pdf(): walk AcroForm, extract XFA, combine with XFA-wins
- Add convert_form_field_to_json() helper for FormFieldValue → FormFieldJson conversion
- Update docs/schema/v1.0/pdftract.schema.json with form_fields $defs and required field
- Add form_fields_to_markdown() to markdown module for Form Fields footer table
Schema shape: /form_fields is array of {name, type, value, default?, page_index?, rect?,
required, read_only, multiline?, max_length?, options?, multi_select?, selected?,
state_name?, pushbutton?, radio?}. Type field is tagged enum: "text", "button", "choice",
"signature". Value field varies by type (string|boolean|string|array|uint|null).
Closes: pdftract-5qca
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add button field value extraction distinguishing pushbutton, checkbox,
and radio button types via /Ff flags. Extracts selected state and
appearance state name (/Yes, /Off, custom).
- New module: forms/value_button.rs with ButtonKind enum and ButtonValue
- Updated FormFieldValue::Button variant with kind and state_name fields
- 15 unit tests covering all button types and edge cases
- Fixed CCITTFaxDecoder test syntax blocking test execution
Closes: pdftract-66pgk
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement Phase 7.4.4: AcroForm + XFA field combiner with XFA-wins
precedence. This enables pdftract to handle hybrid PDF forms that
contain both AcroForm and XFA representations.
- Add FormFieldValue enum with Text, Button, Choice, Signature variants
- Add ChoiceValue enum for single/multiple choice selections
- Implement combine() function that merges AcroForm and XFA fields
with XFA values taking precedence on collision
- Implement XFA boolean string conversion ("true"/"false"/"1"/"0")
to Button selected state
- Preserve AcroForm type hints when XFA provides the value
- Emit diagnostics for field name collisions
- Sort output alphabetically by field name
Closes: pdftract-2qum
Created forms/xfa.rs module with extract_xfa_fields() that:
- Handles single-stream and array-stream XFA layouts
- Uses quick-xml for XML parsing with namespace support
- Extracts field values from XFA data model (xfa:datasets/xfa:data)
- Supports FlateDecode-compressed streams via Phase 1 decoder
- Returns Vec<XfaField> with dot-separated field names
Acceptance criteria:
- Critical test: XFA-only form field values extracted
- Unit tests: single stream, array stream, malformed XML, empty fields
- Public API: extract_xfa_fields(resolver, acroform_dict, source, opts)
- quick-xml feature flags: enabled via existing 'ocr' feature
All tests pass. Closes: pdftract-28e9
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>