pdftract/notes/pdftract-31ag5.md
jedarden 06fb0a8625 docs(pdftract-31ag5): verify Span struct implementation already complete
All acceptance criteria pass:
- Span constructible with all 10 fields per plan
- CssHexColor newtype validates #rrggbb format
- SpanFlags constants (BOLD=1, ITALIC=2, SMALLCAPS=4, SUBSCRIPT=8, SUPERSCRIPT=16)
- ConfidenceSource enum (Native, Heuristic, Ocr)
- Serde JSON serialization round-trips
- Span Clone is cheap (Arc<str> shared)

24/24 tests pass. Implementation matches plan lines 1622-1646.
2026-05-27 21:55:11 -04:00

97 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# pdftract-31ag5: Span struct definition verification
## Summary
The Span struct definition (10 fields per plan) is **already implemented** in `/home/coding/pdftract/crates/pdftract-core/src/span/mod.rs`. All acceptance criteria pass.
## Implementation verified
### Span struct (10 fields)
- `text: String` - concatenated text content
- `bbox: [f32; 4]` - union of member glyph bboxes
- `font: Arc<str>` - font name (shared via Arc)
- `size: f32` - font size in points
- `color: Option<CssHexColor>` - CSS hex color or None
- `rendering_mode: u8` - text rendering mode (0-7)
- `confidence: f32` - minimum glyph confidence [0.0, 1.0]
- `confidence_source: ConfidenceSource` - enum (Native, Heuristic, Ocr)
- `lang: Option<Arc<str>>` - language tag (filled in Phase 7)
- `flags: u8` - SpanFlags bitmask
### CssHexColor newtype
- Validates #rrggbb format at construction
- `CssHexColor::new("#ff0000")` -> Ok
- `CssHexColor::new("red")` -> Err
- Lowercases input for consistency
### SpanFlags constants
- `BOLD = 1 << 0` (bit 0)
- `ITALIC = 1 << 1` (bit 1)
- `SMALLCAPS = 1 << 2` (bit 2)
- `SUBSCRIPT = 1 << 3` (bit 3)
- `SUPERSCRIPT = 1 << 4` (bit 4)
- Bits 5-7 reserved
- Combinable: `BOLD | ITALIC == 3`
### ConfidenceSource enum
- Located in `/home/coding/pdftract/crates/pdftract-core/src/confidence.rs`
- Three variants: `Native`, `Heuristic`, `Ocr`
- Serde serialization to lowercase strings
## Acceptance criteria status
| Criterion | Status | Test |
|-----------|--------|------|
| Span constructible with all fields | PASS | `test_span_constructible_with_all_fields` |
| Span Clone is cheap (Arc<str> shared) | PASS | `test_span_clone_is_cheap` |
| Serde JSON serialization round-trips | PASS | `test_span_serde_json_roundtrip` |
| SpanFlags constants distinct and combinable | PASS | `test_span_flags_combinable` |
| CssHexColor::new("#ff0000") -> Ok | PASS | `test_css_hex_color_new_valid_lowercase` |
| CssHexColor::new("red") -> Err | PASS | `test_css_hex_color_new_invalid_no_hash` |
## Test results
```
running 24 tests
test span::tests::test_css_hex_color_clone_is_cheap ... ok
test span::tests::test_css_hex_color_from_rgb ... ok
test span::tests::test_css_hex_color_new_invalid_no_hash ... ok
test span::tests::test_css_hex_color_new_invalid_non_hex ... ok
test span::tests::test_css_hex_color_new_invalid_too_long ... ok
test span::tests::test_css_hex_color_new_invalid_too_short ... ok
test span::tests::test_css_hex_color_new_valid_lowercase ... ok
test span::tests::test_css_hex_color_new_valid_mixed_case ... ok
test span::tests::test_css_hex_color_new_valid_uppercase ... ok
test span::tests::test_span_clone_is_cheap ... ok
test span::tests::test_span_combined_flags ... ok
test span::tests::test_span_confidence_source_variants ... ok
test span::tests::test_span_constructible_with_all_fields ... ok
test span::tests::test_span_empty ... ok
test span::tests::test_span_flags_bold_bit ... ok
test span::tests::test_span_flags_combinable ... ok
test span::tests::test_span_is_bold ... ok
test span::tests::test_span_is_italic ... ok
test span::tests::test_span_is_smallcaps ... ok
test span::tests::test_span_is_subscript ... ok
test span::tests::test_span_is_superscript ... ok
test span::tests::test_span_size_within_budget ... ok
test span::tests::test_span_with_none_color_serializes ... ok
test span::tests::test_span_serde_json_roundtrip ... ok
test result: ok. 24 passed; 0 failed
```
## Struct size
Actual Span struct size: 104 bytes (within acceptable budget of ~120 bytes)
- Arc<str> for font and lang enables cheap cloning
- String text allocates separately
- CssHexColor wraps String
- Bbox is 16 bytes (4 × f32)
- Scalar fields total 20 bytes
## Files
- `/home/coding/pdftract/crates/pdftract-core/src/span/mod.rs` - Span struct, CssHexColor, SpanFlags
- `/home/coding/pdftract/crates/pdftract-core/src/confidence.rs` - ConfidenceSource enum
- `/home/coding/pdftract/crates/pdftract-core/src/span_flags.rs` - Flag detection logic (separate module)