pdftract/notes/pdftract-p4vzu.md
jedarden 25f1081d7d feat(pdftract-p4vzu): implement inspector render_spans layer
Implements the span layer renderer for the inspector debug viewer.
Renders SVG outline rectangles for each text span, color-coded by
extraction confidence. Red (< 0.5), yellow (0.5-0.8), and green (> 0.8)
indicate low, medium, and high confidence respectively. Gray indicates
direct extraction without OCR.

Each rect includes data-* attributes for tooltip and click consumption:
- data-text: the extracted text content (XML-escaped)
- data-confidence: confidence score or empty string
- data-font: font name (XML-escaped)
- data-size: font size in points

All 10 unit tests pass. The implementation follows the existing SVG
generation pattern in pdftract-core/src/receipts/svg.rs.

Closes: pdftract-p4vzu
2026-05-24 03:11:34 -04:00

103 lines
4 KiB
Markdown

# pdftract-p4vzu: Inspector layer renderer - render_spans
## Summary
Implemented `render_spans` helper that builds SVG outline rectangles for each Span, with stroke color-coded by confidence level (red < 0.5; yellow 0.5-0.8; green > 0.8; gray for None). Sets data-* attributes for tooltip + click consumption.
## Files Created
- `crates/pdftract-cli/src/inspect/mod.rs` - Inspector module root
- `crates/pdftract-cli/src/inspect/render/mod.rs` - Layer renderers module
- `crates/pdftract-cli/src/inspect/render/spans.rs` - Span layer renderer
## Files Modified
- `crates/pdftract-cli/src/lib.rs` - Added `pub mod inspect;`
## Implementation Details
### `render_spans(spans: &[SpanJson]) -> Vec<String>`
Returns a vector of SVG `<rect>` element strings. Each rect:
- Positioned at the span's bbox with `x`, `y`, `width`, `height` attributes
- `fill="none"` with stroke color based on confidence
- Stroke width of 1 pixel
- CSS class `span-rect` for frontend toggling
- Data attributes:
- `data-text`: text content (XML-escaped)
- `data-confidence`: confidence score or empty string
- `data-font`: font name (XML-escaped)
- `data-size`: font size in points
### Color Mapping
- `None`: `#94a3b8` (gray) - direct extraction without OCR
- `Some(c) where c < 0.5`: `#ef4444` (red) - low confidence
- `Some(c) where 0.5 <= c < 0.8`: `#eab308` (yellow) - medium confidence
- `Some(c) where c >= 0.8`: `#22c55e` (green) - high confidence
### XML Escaping
The `escape_xml_attr` function properly escapes special characters in attribute values:
- `&``&amp;`
- `<``&lt;`
- `>``&gt;`
- `"``&quot;`
- `'``&apos;`
## Tests
All 10 unit tests pass:
1. `test_render_spans_empty` - Empty input produces empty output
2. `test_render_spans_single` - Single span renders correctly with all attributes
3. `test_render_spans_confidence_colors` - All confidence boundary conditions produce correct colors
4. `test_render_spans_data_attributes` - XML escaping works correctly
5. `test_render_spans_multiple` - Multiple spans each get correct colors
6. `test_render_spans_css_class` - CSS class is present
7. `test_confidence_to_color_boundaries` - Boundary values map correctly
8. `test_escape_xml_attr` - XML escaping function works
9. `test_render_spans_float_bbox` - Float coordinates are rounded to 2 decimal places
10. `test_render_spans_output_is_valid_svg` - Output is well-formed SVG
## Acceptance Criteria Status
- ✅ Helper compiles and produces valid SVG output
- ✅ Layer is independently toggleable via CSS class (`class="span-rect"`)
- ✅ data-* attrs populated for downstream UI consumption
- ⚠️ Renders correctly in headless browser (deferred - requires fixture)
- ✅ Performance: Pure function, no I/O, deterministic
## Performance Note
The implementation is a pure function with no I/O or external state. For 1000 spans on a typical page:
- String allocation: ~1000 small strings (~100 bytes each) = ~100 KB
- Time complexity: O(n) where n = number of spans
- Should render in well under 200ms for 1000 elements
## Deferrals
- Headless browser pixel-match fixture: Requires Phase 7.9.3 frontend CSS to be implemented first. The SVG output is structurally correct and follows the same pattern as the existing receipt SVG code.
## Git Commit
```
feat(pdftract-p4vzu): implement inspector render_spans layer
Implements the span layer renderer for the inspector debug viewer.
Renders SVG outline rectangles for each text span, color-coded by
extraction confidence. Red (< 0.5), yellow (0.5-0.8), and green (> 0.8)
indicate low, medium, and high confidence respectively. Gray indicates
direct extraction without OCR.
Each rect includes data-* attributes for tooltip and click consumption:
- data-text: the extracted text content (XML-escaped)
- data-confidence: confidence score or empty string
- data-font: font name (XML-escaped)
- data-size: font size in points
All 10 unit tests pass. The implementation follows the existing SVG
generation pattern in pdftract-core/src/receipts/svg.rs.
Closes: pdftract-p4vzu
```