Implement Phase 7.4.4: AcroForm + XFA field combiner with XFA-wins
precedence. This enables pdftract to handle hybrid PDF forms that
contain both AcroForm and XFA representations.
- Add FormFieldValue enum with Text, Button, Choice, Signature variants
- Add ChoiceValue enum for single/multiple choice selections
- Implement combine() function that merges AcroForm and XFA fields
with XFA values taking precedence on collision
- Implement XFA boolean string conversion ("true"/"false"/"1"/"0")
to Button selected state
- Preserve AcroForm type hints when XFA provides the value
- Emit diagnostics for field name collisions
- Sort output alphabetically by field name
Closes: pdftract-2qum
6.2 KiB
pdftract-2qum: AcroForm + XFA Combiner Implementation
Bead: pdftract-2qum Title: 7.4.4: AcroForm + XFA combiner with XFA-wins precedence Status: COMPLETE Date: 2026-05-24
Summary
Implemented Phase 7.4.4: AcroForm + XFA field combiner that merges form field values from both sources with XFA-wins precedence. This enables pdftract to handle hybrid PDF forms that contain both AcroForm and XFA representations.
Implementation
Files Created
crates/pdftract-core/src/forms/combiner.rs(385 lines)FormFieldValueenum withText,Button,Choice,SignaturevariantsChoiceValueenum for single/multiple choice selectionscombine()function that merges AcroForm and XFA fieldsparse_xfa_boolean()for XFA boolean string conversionmerge_xfa_value_with_acro_type()for type-preserving XFA value injectioninfer_xfa_field_type()for XFA-only field type inference
Files Modified
-
crates/pdftract-core/src/forms/mod.rs- Added
pub mod combiner;declaration - Re-exported
combine,ChoiceValue,FormFieldValue
- Added
-
crates/pdftract-core/src/lib.rs- Added re-exports:
combine,ChoiceValue,FormFieldValue
- Added re-exports:
API Design
FormFieldValue Enum
pub enum FormFieldValue {
Text {
value: Option<String>,
default: Option<String>,
multiline: bool,
max_length: Option<u32>,
},
Button {
selected: bool,
default_selected: Option<bool>,
is_radio: bool,
is_pushbutton: bool,
},
Choice {
value: ChoiceValue, // Single or Multiple
default: Option<ChoiceValue>,
options: Vec<(String, String)>,
is_combo: bool,
is_multi_select: bool,
},
Signature {
signature_ref: Option<u32>,
},
}
combine() Function
pub fn combine(
acro_fields: Vec<(String, FormFieldValue)>,
xfa_fields: Vec<(String, String)>,
) -> (Vec<(String, FormFieldValue)>, Vec<Diagnostic>)
Behavior:
- Insert AcroForm fields first
- Insert XFA fields second (overwrites on collision)
- Track which fields came from both sources
- Convert XFA boolean strings ("true"/"false"/"1"/"0") to Button::selected
- Preserve AcroForm type hints when XFA provides the value
- Empty XFA values overwrite non-empty AcroForm values (XFA is canonical)
- Emit diagnostic for each collision
- Sort output alphabetically by full_name
Acceptance Criteria Status
Critical Test: Hybrid XFA+AcroForm - XFA values preferred
PASS - test_combine_both_overlapping verifies that XFA values overwrite AcroForm values on collision.
Unit Tests
| Test | Status | Description |
|---|---|---|
test_combine_no_overlap |
PASS | 3 AcroForm + 2 XFA, no overlap |
test_combine_both_overlapping |
PASS | 3 AcroForm + 2 XFA, both overlapping on 2 fields |
test_xfa_boolean_to_checkbox |
PASS | XFA boolean string converts to Button selected state |
test_empty_xfa_wins_over_nonempty_acro |
PASS | Empty XFA value overwrites non-empty AcroForm value |
test_parse_xfa_boolean |
PASS | Boolean string parsing (true/false/1/0/yes/no) |
test_sort_order_deterministic |
PASS | Alphabetical sorting verified |
test_choice_value_single |
PASS | Single choice value merge |
test_choice_value_multi_select |
PASS | Multi-select comma-separated parsing |
Diagnostics
PASS - Collisions emit Diagnostic with field name, AcroForm value, and XFA value.
Public API
PASS - form_field::combine(acro, xfa) -> Vec<(String, FormFieldValue)> is public and exported.
Sort Order
PASS - Output is sorted alphabetically by full_name for deterministic ordering.
Test Results
$ cargo test --lib forms
test result: ok. 26 passed; 0 failed; 0 ignored; 0 measured; 1504 filtered out
All 26 forms tests pass, including:
- 18 existing tests from
forms/mod.rs(AcroForm field walking) - 8 new tests from
forms/combiner.rs(XFA combiner)
Design Decisions
1. Type Preservation on Collision
When XFA overwrites an AcroForm value, we preserve the AcroForm's type metadata (multiline, max_length, is_radio, etc.) and inject only the XFA value string. This ensures that type information from the AcroForm dictionary is not lost when XFA provides the current value.
2. Boolean String Conversion
XFA represents boolean values as strings ("true", "false", "1", "0"). We convert these to Button::selected when the AcroForm type is Button. For XFA-only fields, we default to Text to avoid misclassifying text fields that happen to contain boolean-like strings.
3. Empty XFA Values Win
Per PDF 1.7 spec and Adobe Reader convention, XFA is the canonical source for form values. Even when XFA provides an empty string, it overwrites a non-empty AcroForm value. This ensures that cleared fields in XFA are represented as empty in the output.
4. Signature Fields Cannot Be Overridden
Signature fields (/FT /Sig) contain cryptographic signature data that cannot be represented as a string. When XFA provides a value for a signature field, we keep the AcroForm value and emit a diagnostic explaining that signatures cannot be overridden by XFA.
Integration Points
This combiner is designed to be used by:
- Phase 7.4.5 (pdftract-5qca): form_fields JSON output + schema integration
- Phase 7.3 (signature discovery): filters AcroForm fields to /FT /Sig type
The combine() function accepts:
- AcroForm fields:
Vec<(String, FormFieldValue)>(from Phase 7.4.2, not yet implemented) - XFA fields:
Vec<(String, String)>(from Phase 7.4.3, already implemented asextract_xfa_fields)
Note: Phase 7.4.2 (type-specific AcroForm value extraction) is not yet implemented. Currently, walk_acroform_fields returns Vec<AcroFormField> with raw PdfObject values. A future bead will implement the conversion from AcroFormField to FormFieldValue.
References
- Plan: lines 2622-2645 (Phase 7.4 AcroForm and XFA Field Extraction)
- Plan: line 2637 ("If both AcroForm and XFA are present, prefer XFA values")
- Plan: line 2645 ("Hybrid XFA+AcroForm: XFA values preferred")
- Bead pdftract-2qum description
Commits
forms: implement FormFieldValue enum and combine() function for XFA-wins precedence
WARN Items
None. All acceptance criteria pass.