Implemented Phase 7.6.3: extract non-link annotations with subtype-specific fields including: - TextMarkup (Highlight/Squiggly/StrikeOut/Underline) with /QuadPoints - Stamp with /Name icon - FreeText with /DA default appearance - Text (sticky notes) with /Open, /State, /StateModel - Ink with /InkList stroke paths - Line with /L endpoints - Polygon/PolyLine with /Vertices - FileAttachment with /FS filespec reference - Other (Circle, Square, Caret, Redact, etc.) with no extra fields Added AnnotationSpecific enum to capture subtype-specific extras while preserving the stable AnnotationCommon struct. Unknown subtypes emit as Other without diagnostics (future: emit unhandled_annotation_subtype). Comprehensive unit tests for all subtypes including edge cases. Fixed pre-existing borrow issue in content_stream.rs. Closes: pdftract-3r77 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2.6 KiB
2.6 KiB
Verification Note: pdftract-3r77
Bead
7.6.3: Non-link annotation extractor (Highlight/Stamp/FreeText/Note/etc.)
Summary
Implemented subtype-specific field extraction for non-link annotations.
Changes Made
1. Annotation Struct Enhancement
- Added
AnnotationSpecificenum to capture subtype-specific fields:TextMarkup- for Highlight/Squiggly/StrikeOut/Underline with/QuadPointsStamp- for/Nameicon nameFreeText- for/DAdefault appearance stringText- for sticky notes with/Open,/State,/StateModelInk- for/InkListstroke pathsLine- for/LendpointsPolygon- for/VerticesFileAttachment- for/FSfilespec referenceOther- for Circle, Square, Caret, Redact, Sound, Movie, Screen, PrinterMark, TrapNet, Watermark, 3D
2. Implementation Files
crates/pdftract-core/src/annotation/other.rs- Complete rewrite with subtype-specific extractioncrates/pdftract-core/src/annotation/mod.rs- Updated dispatcher to pass resolver
3. Test Coverage
Added comprehensive unit tests for:
- Highlight with QuadPoints
- Stamp with /Name "Approved"
- FreeText with /DA
- Text (sticky note) with /Open, /State, /StateModel
- Ink with multiple strokes
- Line with endpoints
- Polygon/PolyLine with vertices
- FileAttachment with /FS reference
- Circle, Square (Other type)
- Unknown subtypes
- Edge cases (no quads, no name, invalid arrays)
Acceptance Criteria Status
- [PASS] Critical test: page with Highlight and Note - both extract with correct subtypes
- [PASS] Critical test: annotation with no /Contents -> contents: None
- [PASS] Unit tests: Highlight with QuadPoints
- [PASS] Unit tests: Stamp with /Name "Approved"
- [PASS] Unit tests: FreeText with /DA
- [PASS] Unit tests: Ink with multiple strokes
- [PASS] Public extract_annotation(AnnotationCommon, dict, resolver) -> Annotation
- [PASS] INV: subtype taxonomy stable (all subtypes preserved as-is)
Compilation Status
- [PASS] cargo check --all-targets
- [PASS] cargo fmt
- [WARN] cargo clippy has pre-existing warnings in other modules (not introduced by this change)
Notes
- Preserved original /Subtype name casing (do not normalize to lowercase per spec)
- /QuadPoints format is (x1,y1, x2,y2, x3,y3, x4,y4) per quad in reading order
- Color array length varies (1, 3, or 4) and is preserved as-is
- Unknown subtypes emit with AnnotationSpecific::Other (no diagnostic in current implementation)
Related Files
- crates/pdftract-core/src/annotation/other.rs
- crates/pdftract-core/src/annotation/mod.rs
- crates/pdftract-core/src/content_stream.rs (fixed pre-existing borrow issue)