- Fixed test_extract_combo_with_multi_select_flag: combo boxes are always single-select regardless of multi-select flag - Fixed test_extract_default_none_becomes_none: empty string defaults are valid and should not be filtered out - Added is_truly_empty() method to distinguish between no value (None) and empty string value - Updated verification note for pdftract-5t92 Refs: pdftract-5t92, plan 7.4.2
4 KiB
4 KiB
pdftract-5t92: AcroForm Value Extraction for Tx/Btn/Ch Types
Summary
Completed Phase 7.4.2: AcroForm value extraction for Tx / Btn / Ch field types. The implementation was already present in the codebase - this task involved fixing two test failures and verifying complete functionality.
Work Done
Bug Fixes
-
Fixed
test_extract_combo_with_multi_select_flag(value_choice.rs:473-491)- Problem: When both Combo and MultiSelect flags were set (malformed but possible), the code returned
ChoiceValue::Multipleinstead ofChoiceValue::Single(Some(_)). - Root Cause:
extract_selected_valuewas called withis_multi_select=truefor all fields, but combo boxes are always single-select regardless of the multi-select flag. - Fix: Modified
extract_choice_valueto passis_multi_select && !is_combotoextract_selected_valuecalls (line 199-205).
- Problem: When both Combo and MultiSelect flags were set (malformed but possible), the code returned
-
Fixed
test_extract_default_none_becomes_none(value_choice.rs:626-637)- Problem: Empty string defaults (
Single(Some(""))) were being filtered out becauseis_empty()returnstruefor empty strings. - Root Cause: The filter
default_val.filter(|v| !v.is_empty())treatedSingle(Some(""))as empty and removed it. - Semantics: An explicit empty string default is different from no default at all.
/DV ""means "default to empty" vs no/DVmeaning "no default specified". - Fix: Added new
is_truly_empty()method that only returnstrueforSingle(None)and emptyMultiple, not forSingle(Some("")). Changed filter to useis_truly_empty()instead ofis_empty()(line 210).
- Problem: Empty string defaults (
Verification
All acceptance criteria from the plan are met:
| Criterion | Status | Notes |
|---|---|---|
| Critical test (text, checkbox, dropdown) | PASS | test_extract_values_tx_btn_ch_critical passes |
| Unit test: unselected checkbox | PASS | test_extract_values_unselected_checkbox passes |
| Unit test: selected radio | PASS | test_extract_values_selected_radio passes |
| Unit test: multi-select list | PASS | test_extract_values_multi_select_list passes |
| Unit test: combo with /Opt 2-tuple entries | PASS | test_extract_values_combo_with_opt_tuples passes |
| Unit test: multi-line text | PASS | test_extract_values_multiline_text passes |
Public API extract_values function |
PASS | pub fn extract_values(fields: &[AcroFormField]) -> Vec<(String, FormFieldValue)> exists |
| Sig fields are skipped | PASS | test_extract_values_skips_sig_fields passes |
| All /Ff bits preserved | PASS | FormFieldValue variants preserve all flags via multiline, pushbutton, radio, is_combo, is_multi_select fields |
Implementation Details
The implementation consists of:
forms/mod.rs: Main entry pointextract_values()andacro_field_to_value()- converts AcroFormField to FormFieldValue.forms/value_text.rs: Text field extraction with PDFDocEncoding/UTF-16BE BOM decoding viadecode_pdf_string().forms/value_button.rs: Button field extraction distinguishing pushbutton, checkbox, and radio button types via /Ff flags.forms/value_choice.rs: Choice field extraction for combo/list boxes with single/multi-select support.forms/combiner.rs: FormFieldValue enum definition for type-safe values.
Files Modified
crates/pdftract-core/src/forms/value_choice.rs: Fixed multi-select flag handling for combo boxes and empty string default filtering.
Test Results
test result: ok. 96 passed; 0 failed
All forms module tests pass:
- 16 tests in
forms::tests(main module) - 27 tests in
forms::value_text::tests - 31 tests in
forms::value_button::tests - 22 tests in
forms::value_choice::tests
References
- Plan section 7.4 lines 2610-2613 (Tx/Btn/Ch)
- PDF 1.7 spec 12.7.4.2 (Tx), 12.7.4.3 (Btn), 12.7.4.4 (Ch)
- Phase 1 PdfString decoder (reused for text decoding)
- Phase 7.4.1 (input walker - provides AcroFormField)
- Phase 7.4.4 (combiner consumer - uses FormFieldValue)