docs(pdftract-34hxw): verify AcroForm Tx text field value extraction complete

The implementation in value_text.rs already handles all requirements:
- TextValue struct with value, default, multiline, max_length fields
- PDFDocEncoding and UTF-16BE BOM decoding
- All 12 tests passing
- Proper integration into FormFieldValue enum

No code changes required. All acceptance criteria PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-05-29 00:08:52 -04:00
parent 3f346a7a71
commit 65c3747133

101
notes/pdftract-34hxw.md Normal file
View file

@ -0,0 +1,101 @@
# pdftract-34hxw: AcroForm Tx (text field) value extraction
## Status: PASS (implementation already present)
## Summary
The AcroForm /Tx (text field) value extraction is already fully implemented in `crates/pdftract-core/src/forms/value_text.rs`. The implementation correctly handles all requirements from the bead description.
## Implementation Verification
### Module Location
- **File:** `crates/pdftract-core/src/forms/value_text.rs` (737 lines)
- **Exports:** `TextValue` struct and `extract_text_value` function
- **Re-exports in:** `crates/pdftract-core/src/forms/mod.rs`
### TextValue Struct
```rust
pub struct TextValue {
pub value: Option<String>, // Current value (/V)
pub default: Option<String>, // Default value (/DV)
pub multiline: bool, // /Ff bit 12 (1<<12 = 0x1000)
pub max_length: Option<u32>, // /MaxLen (negative → None)
}
```
### FormFieldValue::Text Variant
The `FormFieldValue::Text` variant is properly defined in `combiner.rs`:
```rust
pub enum FormFieldValue {
Text {
value: Option<String>,
default: Option<String>,
multiline: bool,
max_length: Option<u32>,
},
// ... other variants
}
```
### PDFDocEncoding/UTF-16BE Decoding
The `decode_pdf_string()` function correctly implements:
1. UTF-16BE BOM detection (`0xFE 0xFF` prefix)
2. UTF-16BE decoding (with and without BOM)
3. PDFDocEncoding fallback (full 29-character override table from PDF spec Annex D.2)
4. Heuristic UTF-16BE detection for malformed inputs
### Acceptance Criteria Status
| Criterion | Status | Notes |
|-----------|--------|-------|
| Text field with /V → FormFieldValue::Text { value: Some(...), ... } | ✅ PASS | `test_extract_text_value_basic` |
| UTF-16BE BOM-prefixed /V → correct Unicode decode | ✅ PASS | `test_extract_text_value_utf16be_bom` |
| /Ff multiline bit set → multiline: true | ✅ PASS | `test_extract_text_value_multiline` |
| /MaxLen 50 → max_length: Some(50) | ✅ PASS | `test_extract_text_value_with_max_length` |
| Empty /V → value: Some("") | ✅ PASS | `test_extract_text_value_empty_value` |
| Missing /V → value: None | ✅ PASS | `test_extract_text_value_no_value` |
### Test Results
All 12 text_value tests passed:
```
PASS [ 0.014s] test_extract_text_value_name_as_value
PASS [ 0.016s] test_extract_text_value_with_max_length
PASS [ 0.016s] test_extract_text_value_utf16be_bom
PASS [ 0.017s] test_text_value_empty_constructor
PASS [ 0.016s] test_extract_text_value_multiline
PASS [ 0.017s] test_text_value_equality
PASS [ 0.017s] test_extract_text_value_with_default
PASS [ 0.018s] test_extract_text_value_basic
PASS [ 0.018s] test_extract_text_value_no_value
PASS [ 0.020s] test_extract_text_value_negative_max_length_ignored
PASS [ 0.020s] test_extract_text_value_combined_flags
PASS [ 0.021s] test_extract_text_value_empty_value
Summary [ 0.028s] 12 tests run: 12 passed, 2660 skipped
```
### Additional Test Coverage
The module also includes comprehensive PDFDocEncoding tests:
- `test_decode_pdf_string_ascii`
- `test_decode_pdf_string_utf16be_bom`
- `test_decode_pdf_string_utf16be_bom_odd_length`
- `test_decode_pdf_string_pdfdocencoding_latin1`
- `test_decode_pdf_string_pdfdocencoding_lower_latin1`
- `test_decode_pdf_string_pdfdocencoding_bullet`
- `test_decode_pdf_string_pdfdocencoding_em_dash`
- `test_decode_pdf_string_pdfdocencoding_quotes`
- `test_decode_pdf_string_empty`
- `test_looks_like_utf16be`
- `test_extract_string_from_value_unrecognized_type`
- `test_decode_pdf_string_never_panics`
- `test_extract_text_value_combined_flags`
### Integration
The implementation is properly integrated:
1. Exported from `forms/mod.rs` as `pub use value_text::{extract_text_value, TextValue}`
2. Used by `acro_field_to_value()` function for Tx field conversion
3. Consumed by `combine()` function in combiner.rs
4. Part of the FormFieldValue enum for JSON serialization
## Conclusion
No code changes were required. The implementation is complete, well-tested, and properly integrated into the forms pipeline. All acceptance criteria are met.