docs(pdftract-34hxw): verify AcroForm Tx text field value extraction complete

The implementation in value_text.rs already handles all requirements: - TextValue struct with value, default, multiline, max_length fields - PDFDocEncoding and UTF-16BE BOM decoding - All 12 tests passing - Proper integration into FormFieldValue enum No code changes required. All acceptance criteria PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 00:08:52 -04:00 · 2026-05-29 00:08:52 -04:00 · 65c3747133
commit 65c3747133
parent 3f346a7a71
1 changed files with 101 additions and 0 deletions
--- a/notes/pdftract-34hxw.md
+++ b/notes/pdftract-34hxw.md
@ -0,0 +1,101 @@
+# pdftract-34hxw: AcroForm Tx (text field) value extraction
+
+## Status: PASS (implementation already present)
+
+## Summary
+
+The AcroForm /Tx (text field) value extraction is already fully implemented in `crates/pdftract-core/src/forms/value_text.rs`. The implementation correctly handles all requirements from the bead description.
+
+## Implementation Verification
+
+### Module Location
+- **File:** `crates/pdftract-core/src/forms/value_text.rs` (737 lines)
+- **Exports:** `TextValue` struct and `extract_text_value` function
+- **Re-exports in:** `crates/pdftract-core/src/forms/mod.rs`
+
+### TextValue Struct
+```rust
+pub struct TextValue {
+    pub value: Option<String>,      // Current value (/V)
+    pub default: Option<String>,    // Default value (/DV)
+    pub multiline: bool,            // /Ff bit 12 (1<<12 = 0x1000)
+    pub max_length: Option<u32>,    // /MaxLen (negative → None)
+}
+```
+
+### FormFieldValue::Text Variant
+The `FormFieldValue::Text` variant is properly defined in `combiner.rs`:
+```rust
+pub enum FormFieldValue {
+    Text {
+        value: Option<String>,
+        default: Option<String>,
+        multiline: bool,
+        max_length: Option<u32>,
+    },
+    // ... other variants
+}
+```
+
+### PDFDocEncoding/UTF-16BE Decoding
+The `decode_pdf_string()` function correctly implements:
+1. UTF-16BE BOM detection (`0xFE 0xFF` prefix)
+2. UTF-16BE decoding (with and without BOM)
+3. PDFDocEncoding fallback (full 29-character override table from PDF spec Annex D.2)
+4. Heuristic UTF-16BE detection for malformed inputs
+
+### Acceptance Criteria Status
+
+| Criterion | Status | Notes |
+|-----------|--------|-------|
+| Text field with /V → FormFieldValue::Text { value: Some(...), ... } | ✅ PASS | `test_extract_text_value_basic` |
+| UTF-16BE BOM-prefixed /V → correct Unicode decode | ✅ PASS | `test_extract_text_value_utf16be_bom` |
+| /Ff multiline bit set → multiline: true | ✅ PASS | `test_extract_text_value_multiline` |
+| /MaxLen 50 → max_length: Some(50) | ✅ PASS | `test_extract_text_value_with_max_length` |
+| Empty /V → value: Some("") | ✅ PASS | `test_extract_text_value_empty_value` |
+| Missing /V → value: None | ✅ PASS | `test_extract_text_value_no_value` |
+
+### Test Results
+All 12 text_value tests passed:
+```
+PASS [   0.014s] test_extract_text_value_name_as_value
+PASS [   0.016s] test_extract_text_value_with_max_length
+PASS [   0.016s] test_extract_text_value_utf16be_bom
+PASS [   0.017s] test_text_value_empty_constructor
+PASS [   0.016s] test_extract_text_value_multiline
+PASS [   0.017s] test_text_value_equality
+PASS [   0.017s] test_extract_text_value_with_default
+PASS [   0.018s] test_extract_text_value_basic
+PASS [   0.018s] test_extract_text_value_no_value
+PASS [   0.020s] test_extract_text_value_negative_max_length_ignored
+PASS [   0.020s] test_extract_text_value_combined_flags
+PASS [   0.021s] test_extract_text_value_empty_value
+Summary [   0.028s] 12 tests run: 12 passed, 2660 skipped
+```
+
+### Additional Test Coverage
+The module also includes comprehensive PDFDocEncoding tests:
+- `test_decode_pdf_string_ascii`
+- `test_decode_pdf_string_utf16be_bom`
+- `test_decode_pdf_string_utf16be_bom_odd_length`
+- `test_decode_pdf_string_pdfdocencoding_latin1`
+- `test_decode_pdf_string_pdfdocencoding_lower_latin1`
+- `test_decode_pdf_string_pdfdocencoding_bullet`
+- `test_decode_pdf_string_pdfdocencoding_em_dash`
+- `test_decode_pdf_string_pdfdocencoding_quotes`
+- `test_decode_pdf_string_empty`
+- `test_looks_like_utf16be`
+- `test_extract_string_from_value_unrecognized_type`
+- `test_decode_pdf_string_never_panics`
+- `test_extract_text_value_combined_flags`
+
+### Integration
+The implementation is properly integrated:
+1. Exported from `forms/mod.rs` as `pub use value_text::{extract_text_value, TextValue}`
+2. Used by `acro_field_to_value()` function for Tx field conversion
+3. Consumed by `combine()` function in combiner.rs
+4. Part of the FormFieldValue enum for JSON serialization
+
+## Conclusion
+
+No code changes were required. The implementation is complete, well-tested, and properly integrated into the forms pipeline. All acceptance criteria are met.