pdftract/notes/pdftract-3jekw.md
jedarden 336e48a7dd feat(pdftract-3jekw): implement watermark and formula detection stubs
Add Phase 4 stub classifiers for Watermark and Formula block kinds.
Full detection deferred to Phase 7 per plan section 4.4 (line 1709)
and 4.6 watermark note (line 1752).

Changes:
- Create crates/pdftract-core/src/layout/watermark_formula.rs with
  classify_watermark() and classify_formula() stubs returning false
- Update crates/pdftract-core/src/layout/mod.rs to export the stubs
- Add comprehensive module documentation linking to Phase 7 research

Acceptance criteria:
- BlockKind::Watermark and BlockKind::Formula variants exist (pre-existing)
- classify_watermark always false
- classify_formula always false
- No v0.1.0 block has kind=Watermark or Formula

Refs: pdftract-3jekw
2026-05-27 23:32:22 -04:00

75 lines
3.1 KiB
Markdown

# pdftract-3jekw: Watermark / Formula Detection Stubs (Phase 7 Deferred)
## Work Completed
### 1. Module Created
Created `crates/pdftract-core/src/layout/watermark_formula.rs` with stub implementations:
- `classify_watermark(block) -> bool`: Always returns `false` (Phase 4 stub)
- `classify_formula(block) -> bool`: Always returns `false` (Phase 4 stub)
### 2. Module Integration
Updated `crates/pdftract-core/src/layout/mod.rs`:
- Added module declaration: `pub mod watermark_formula;`
- Added public exports: `pub use watermark_formula::{classify_formula, classify_watermark};`
- Updated module documentation to reference the stub classifiers
### 3. Module Documentation
The module includes comprehensive documentation:
- Links to Phase 7 research notes for watermark detection (`docs/research/watermark-and-background-separation.md`)
- References plan.md Phase 7.1 (watermark) and Phase 7.2 (formula) specifications
- TODO comments outlining the full implementation requirements
### 4. Tests
Module includes 4 tests verifying stub behavior:
- `test_classify_watermark_always_false`: Verifies watermark stub returns false
- `test_classify_formula_always_false`: Verifies formula stub returns false
- `test_watermark_stub_documentation`: Documents Phase 4 behavior
- `test_formula_stub_documentation`: Documents Phase 4 behavior
## Acceptance Criteria
| Criterion | Status | Notes |
|-----------|--------|-------|
| BlockKind::Watermark variant exists | PASS | Already present in `parser/struct_tree.rs:1424` |
| BlockKind::Formula variant exists | PASS | Already present in `parser/struct_tree.rs:1422` |
| classify_watermark always false | PASS | Stub function returns `false` |
| classify_formula always false | PASS | Stub function returns `false` |
| No v0.1.0 block has kind=Watermark or Formula | PASS | Stubs ensure no blocks are classified |
## Plan References
- Phase 4.4 (line 1709): Block formation and kind assignment
- Phase 4.6 (line 1752): Watermark exclusion note ("Prior to Phase 7, watermarks are not excluded from --text output; kind: 'watermark' blocks are not emitted")
- Phase 7.1: Watermark detection (deferred)
- Phase 7.2: Formula detection (deferred)
## Implementation Details
### BlockKind Enum (existing)
```rust
// crates/pdftract-core/src/parser/struct_tree.rs
pub enum BlockKind {
// ...
Formula, // Line 1422
Watermark, // Line 1424 (commented: "Phase 7 stub - always false")
// ...
}
```
### Stub Functions (new)
```rust
// crates/pdftract-core/src/layout/watermark_formula.rs
pub fn classify_watermark<S>(_block: &Block<S>) -> bool {
false // Phase 4 stub
}
pub fn classify_formula<S>(_block: &Block<S>) -> bool {
false // Phase 4 stub
}
```
## Verification Note
The stubs are correctly implemented and will be upgraded to full detection logic in Phase 7. The existence of these stubs allows downstream consumers (JSON schema, markdown, profile extraction) to be coded against the full taxonomy without breaking changes later.
## Files Modified
- `crates/pdftract-core/src/layout/watermark_formula.rs` (new)
- `crates/pdftract-core/src/layout/mod.rs` (module declaration and exports)