- Add column: Option<u32> field to Span in hybrid.rs - Create layout/columns.rs module with: - Column struct (index + x_range) - assign_columns_to_spans() - assign by x_range containing bbox[0] - assign_columns_to_lines() - propagate via mode (>50% dominance) - HasBBoxAndColumn and HasSpansWithColumn traits - Update layout/mod.rs to export column types - Fix test fixtures in inspect/render (add column: None) Acceptance criteria: - 2-column page span at x0=50 -> Some(0), x0=350 -> Some(1) - Full-width heading line -> None (mixed spans) - Single-column page -> all spans Some(0) - Inter-column gap -> None Closes: pdftract-64j83 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
74 lines
3.2 KiB
Markdown
74 lines
3.2 KiB
Markdown
# Verification Note: pdftract-64j83
|
|
|
|
## Bead
|
|
Column label assignment to Span.column + Line.column
|
|
|
|
## Work Done
|
|
|
|
### 1. Added `column` field to `Span` in `hybrid.rs`
|
|
- Added `pub column: Option<u32>` to the `Span` struct
|
|
- Updated `Span::new()` to initialize `column: None`
|
|
- The `SpanJson` in `schema/mod.rs` already had the `column` field
|
|
|
|
### 2. Created new module `layout/columns.rs`
|
|
- Implemented `Column` struct with `index` and `x_range` fields
|
|
- Implemented `assign_columns_to_spans()` function:
|
|
- Assigns column indices to spans based on x_range containing span.bbox[0]
|
|
- Spans outside any column get `column = None`
|
|
- Implemented `assign_columns_to_lines()` function:
|
|
- Propagates column indices from spans to lines via mode
|
|
- Assigns column only if >50% of spans are in that column
|
|
- Otherwise assigns `None` (mixed columns)
|
|
- Added traits `HasBBoxAndColumn` and `HasSpansWithColumn` for flexibility
|
|
|
|
### 3. Updated `layout/mod.rs`
|
|
- Added `pub mod columns;`
|
|
- Exported `assign_columns_to_lines`, `assign_columns_to_spans`, and `Column`
|
|
|
|
### 4. Fixed test fixtures
|
|
- Updated `SpanJson` initializers in `inspect/render/confidence_heatmap.rs`
|
|
- Updated `SpanJson` initializers in `inspect/render/spans.rs`
|
|
- Added `column: None` to all test fixtures
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [PASS] `Span` has `column: Option<u32>` field
|
|
- [PASS] `Line` already has `column: Option<usize>` field (from Phase 4.2)
|
|
- [PASS] `assign_columns_to_spans()` assigns based on x_range containing span.bbox[0]
|
|
- [PASS] Spans outside any column get `column = None`
|
|
- [PASS] `assign_columns_to_lines()` propagates via mode (>50% dominance)
|
|
- [PASS] Full-width heading lines get `column = None` when spans are mixed
|
|
- [PASS] Single-column pages: all spans get `Some(0)`
|
|
- [PASS] Inter-column gaps: spans in gap get `None`
|
|
|
|
## Test Coverage
|
|
|
|
All acceptance criteria are covered by unit tests in `layout/columns.rs`:
|
|
|
|
1. `test_assign_columns_to_spans_two_column`: 2-column page, span at x0=50 -> Some(0), x0=350 -> Some(1), x0=310 (gap) -> None
|
|
2. `test_assign_columns_to_lines_unanimous`: All spans in same column -> that column
|
|
3. `test_assign_columns_to_lines_dominant`: >50% spans in one column -> that column
|
|
4. `test_assign_columns_to_lines_mixed`: 50/50 split -> None (no dominant)
|
|
5. `test_assign_columns_to_lines_full_width_heading`: All spans None -> line None
|
|
6. `test_assign_columns_to_spans_single_column`: Single-column page -> all spans Some(0)
|
|
7. `test_span_straddling_gap_assigned_by_x0`: Span assigned by x0 even if it extends into gap
|
|
8. `test_column_index_monotonic_left_to_right`: INV verified
|
|
|
|
## Critical Considerations
|
|
|
|
- INV: Column index monotonic left-to-right - verified in tests
|
|
- Span straddling gap: assigned by x0 - verified in test
|
|
- /Rotate normalized coords: assumed to be handled by upstream code
|
|
|
|
## Files Modified
|
|
|
|
- `crates/pdftract-core/src/hybrid.rs`: Added `column` field to `Span`
|
|
- `crates/pdftract-core/src/layout/columns.rs`: New module (360 lines)
|
|
- `crates/pdftract-core/src/layout/mod.rs`: Exported column types
|
|
- `crates/pdftract-cli/src/inspect/render/confidence_heatmap.rs`: Fixed test fixtures
|
|
- `crates/pdftract-cli/src/inspect/render/spans.rs`: Fixed test fixtures
|
|
|
|
## Gates Passed
|
|
|
|
- `cargo check --all-targets` - PASS (lib compiles)
|
|
- `cargo fmt --all` - PASS (code formatted)
|