docs(pdftract-2hag2): add verification note for all_tr3_with_full_page_image signal evaluator
Bead-Id: pdftract-3779n
This commit is contained in:
parent
0c08bd0d9a
commit
c263189361
1 changed files with 101 additions and 0 deletions
101
notes/pdftract-2hag2.md
Normal file
101
notes/pdftract-2hag2.md
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
# Verification Note: pdftract-2hag2
|
||||
|
||||
## Bead: Signal evaluator: all_tr3_with_full_page_image (Tr=3 + image >= 95% page -> BrokenVector definitive)
|
||||
|
||||
## Summary
|
||||
|
||||
The `all_tr3_with_full_page_image` signal evaluator was implemented in commit `39ca6a3` as part of bead pdftract-2b7ff (image_coverage_fraction). The implementation is correct and all acceptance criteria are met.
|
||||
|
||||
## Implementation Location
|
||||
|
||||
- **File**: `crates/pdftract-core/src/classify.rs`
|
||||
- **Function**: `pub fn all_tr3_with_full_page_image(ctx: &PageContext) -> Option<Vote>` (lines 339-357)
|
||||
- **Tests**: Lines 2459-2695
|
||||
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
### 1. text_op_count=10, tr3_op_count=10, full_page_image=true → Some(Vote { 0.99, BrokenVector })
|
||||
- **Test**: `test_all_tr3_with_full_page_image_exact_match` (line 2462)
|
||||
- **Status**: PASS
|
||||
|
||||
### 2. text_op_count=10, tr3_op_count=5 → None (mix of Tr=3 and visible)
|
||||
- **Test**: `test_all_tr3_with_full_page_image_mixed_tr3` (line 2516)
|
||||
- **Status**: PASS
|
||||
|
||||
### 3. text_op_count=0 → None (no text)
|
||||
- **Test**: `test_all_tr3_with_full_page_image_no_text` (line 2532)
|
||||
- **Status**: PASS
|
||||
|
||||
### 4. full_page_image=false → None
|
||||
- **Test**: `test_all_tr3_with_full_page_image_no_full_page_image` (line 2548)
|
||||
- **Status**: PASS
|
||||
|
||||
### 5. Unit tests
|
||||
- **Total tests**: 10 tests covering all edge cases
|
||||
- **Additional tests**:
|
||||
- Exactly 95% coverage (threshold edge case)
|
||||
- Just below 95% threshold
|
||||
- Multiple images (one large enough)
|
||||
- Zero page area (division by zero guard)
|
||||
- Empty image areas
|
||||
- Definitive short-circuit verification
|
||||
- Standard US Letter and A4 page sizes
|
||||
- **Status**: PASS
|
||||
|
||||
## Implementation Review
|
||||
|
||||
### Signature
|
||||
```rust
|
||||
pub fn all_tr3_with_full_page_image(ctx: &PageContext) -> Option<Vote>
|
||||
```
|
||||
**Status**: ✓ Correct
|
||||
|
||||
### Logic
|
||||
```rust
|
||||
let all_tr3 = ctx.text_op_count > 0 && ctx.tr3_op_count == ctx.text_op_count;
|
||||
let page_area = ctx.width * ctx.height;
|
||||
let full_page_image = if page_area > 0.0 {
|
||||
ctx.image_xobject_areas.iter().any(|&area| area / page_area >= 0.95)
|
||||
} else {
|
||||
false
|
||||
};
|
||||
if all_tr3 && full_page_image {
|
||||
return Some(Vote::broken_vector(0.99));
|
||||
}
|
||||
None
|
||||
```
|
||||
**Status**: ✓ Correct
|
||||
|
||||
### Key Features
|
||||
- All text operators must be Tr=3 (not just some) - enforced by `tr3_op_count == text_op_count`
|
||||
- Single image XObject covering >= 95% of page area - uses `iter().any()` to check if ANY image meets threshold
|
||||
- Definitive strength 0.99 for short-circuit behavior in PageClassifier
|
||||
- Division by zero guard when `page_area <= 0.0`
|
||||
- Returns `None` for all non-matching cases
|
||||
|
||||
## Integration
|
||||
|
||||
The signal is integrated into the PageClassifier via `InvisibleTextWithImageSignal` (lines 195-207):
|
||||
```rust
|
||||
struct InvisibleTextWithImageSignal;
|
||||
|
||||
impl SignalEvaluator for InvisibleTextWithImageSignal {
|
||||
fn evaluate(&self, ctx: &PageContext) -> Option<Vote> {
|
||||
all_tr3_with_full_page_image(ctx)
|
||||
}
|
||||
|
||||
fn name(&self) -> &'static str {
|
||||
"all_tr3_with_full_page_image"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- EC-12: Tr=3 + full-page-image is a deterministic BrokenVector signal
|
||||
- Plan section: Phase 5.1.2
|
||||
- Commit: 39ca6a3 (feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator)
|
||||
|
||||
## Conclusion
|
||||
|
||||
**PASS**: All acceptance criteria are met. The implementation is correct, complete, and fully tested.
|
||||
Loading…
Add table
Reference in a new issue