The implementation in value_text.rs already handles all requirements: - TextValue struct with value, default, multiline, max_length fields - PDFDocEncoding and UTF-16BE BOM decoding - All 12 tests passing - Proper integration into FormFieldValue enum No code changes required. All acceptance criteria PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 KiB
4 KiB
pdftract-34hxw: AcroForm Tx (text field) value extraction
Status: PASS (implementation already present)
Summary
The AcroForm /Tx (text field) value extraction is already fully implemented in crates/pdftract-core/src/forms/value_text.rs. The implementation correctly handles all requirements from the bead description.
Implementation Verification
Module Location
- File:
crates/pdftract-core/src/forms/value_text.rs(737 lines) - Exports:
TextValuestruct andextract_text_valuefunction - Re-exports in:
crates/pdftract-core/src/forms/mod.rs
TextValue Struct
pub struct TextValue {
pub value: Option<String>, // Current value (/V)
pub default: Option<String>, // Default value (/DV)
pub multiline: bool, // /Ff bit 12 (1<<12 = 0x1000)
pub max_length: Option<u32>, // /MaxLen (negative → None)
}
FormFieldValue::Text Variant
The FormFieldValue::Text variant is properly defined in combiner.rs:
pub enum FormFieldValue {
Text {
value: Option<String>,
default: Option<String>,
multiline: bool,
max_length: Option<u32>,
},
// ... other variants
}
PDFDocEncoding/UTF-16BE Decoding
The decode_pdf_string() function correctly implements:
- UTF-16BE BOM detection (
0xFE 0xFFprefix) - UTF-16BE decoding (with and without BOM)
- PDFDocEncoding fallback (full 29-character override table from PDF spec Annex D.2)
- Heuristic UTF-16BE detection for malformed inputs
Acceptance Criteria Status
| Criterion | Status | Notes |
|---|---|---|
| Text field with /V → FormFieldValue::Text { value: Some(...), ... } | ✅ PASS | test_extract_text_value_basic |
| UTF-16BE BOM-prefixed /V → correct Unicode decode | ✅ PASS | test_extract_text_value_utf16be_bom |
| /Ff multiline bit set → multiline: true | ✅ PASS | test_extract_text_value_multiline |
| /MaxLen 50 → max_length: Some(50) | ✅ PASS | test_extract_text_value_with_max_length |
| Empty /V → value: Some("") | ✅ PASS | test_extract_text_value_empty_value |
| Missing /V → value: None | ✅ PASS | test_extract_text_value_no_value |
Test Results
All 12 text_value tests passed:
PASS [ 0.014s] test_extract_text_value_name_as_value
PASS [ 0.016s] test_extract_text_value_with_max_length
PASS [ 0.016s] test_extract_text_value_utf16be_bom
PASS [ 0.017s] test_text_value_empty_constructor
PASS [ 0.016s] test_extract_text_value_multiline
PASS [ 0.017s] test_text_value_equality
PASS [ 0.017s] test_extract_text_value_with_default
PASS [ 0.018s] test_extract_text_value_basic
PASS [ 0.018s] test_extract_text_value_no_value
PASS [ 0.020s] test_extract_text_value_negative_max_length_ignored
PASS [ 0.020s] test_extract_text_value_combined_flags
PASS [ 0.021s] test_extract_text_value_empty_value
Summary [ 0.028s] 12 tests run: 12 passed, 2660 skipped
Additional Test Coverage
The module also includes comprehensive PDFDocEncoding tests:
test_decode_pdf_string_asciitest_decode_pdf_string_utf16be_bomtest_decode_pdf_string_utf16be_bom_odd_lengthtest_decode_pdf_string_pdfdocencoding_latin1test_decode_pdf_string_pdfdocencoding_lower_latin1test_decode_pdf_string_pdfdocencoding_bullettest_decode_pdf_string_pdfdocencoding_em_dashtest_decode_pdf_string_pdfdocencoding_quotestest_decode_pdf_string_emptytest_looks_like_utf16betest_extract_string_from_value_unrecognized_typetest_decode_pdf_string_never_panicstest_extract_text_value_combined_flags
Integration
The implementation is properly integrated:
- Exported from
forms/mod.rsaspub use value_text::{extract_text_value, TextValue} - Used by
acro_field_to_value()function for Tx field conversion - Consumed by
combine()function in combiner.rs - Part of the FormFieldValue enum for JSON serialization
Conclusion
No code changes were required. The implementation is complete, well-tested, and properly integrated into the forms pipeline. All acceptance criteria are met.