pdftract/notes/pdftract-64j83.md
jedarden cce26bb6b6 feat(pdftract-64j83): implement column label assignment to Span.column + Line.column
- Add column: Option<u32> field to Span in hybrid.rs
- Create layout/columns.rs module with:
  - Column struct (index + x_range)
  - assign_columns_to_spans() - assign by x_range containing bbox[0]
  - assign_columns_to_lines() - propagate via mode (>50% dominance)
  - HasBBoxAndColumn and HasSpansWithColumn traits
- Update layout/mod.rs to export column types
- Fix test fixtures in inspect/render (add column: None)

Acceptance criteria:
- 2-column page span at x0=50 -> Some(0), x0=350 -> Some(1)
- Full-width heading line -> None (mixed spans)
- Single-column page -> all spans Some(0)
- Inter-column gap -> None

Closes: pdftract-64j83

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 14:45:19 -04:00

74 lines
3.2 KiB
Markdown

# Verification Note: pdftract-64j83
## Bead
Column label assignment to Span.column + Line.column
## Work Done
### 1. Added `column` field to `Span` in `hybrid.rs`
- Added `pub column: Option<u32>` to the `Span` struct
- Updated `Span::new()` to initialize `column: None`
- The `SpanJson` in `schema/mod.rs` already had the `column` field
### 2. Created new module `layout/columns.rs`
- Implemented `Column` struct with `index` and `x_range` fields
- Implemented `assign_columns_to_spans()` function:
- Assigns column indices to spans based on x_range containing span.bbox[0]
- Spans outside any column get `column = None`
- Implemented `assign_columns_to_lines()` function:
- Propagates column indices from spans to lines via mode
- Assigns column only if >50% of spans are in that column
- Otherwise assigns `None` (mixed columns)
- Added traits `HasBBoxAndColumn` and `HasSpansWithColumn` for flexibility
### 3. Updated `layout/mod.rs`
- Added `pub mod columns;`
- Exported `assign_columns_to_lines`, `assign_columns_to_spans`, and `Column`
### 4. Fixed test fixtures
- Updated `SpanJson` initializers in `inspect/render/confidence_heatmap.rs`
- Updated `SpanJson` initializers in `inspect/render/spans.rs`
- Added `column: None` to all test fixtures
## Acceptance Criteria
- [PASS] `Span` has `column: Option<u32>` field
- [PASS] `Line` already has `column: Option<usize>` field (from Phase 4.2)
- [PASS] `assign_columns_to_spans()` assigns based on x_range containing span.bbox[0]
- [PASS] Spans outside any column get `column = None`
- [PASS] `assign_columns_to_lines()` propagates via mode (>50% dominance)
- [PASS] Full-width heading lines get `column = None` when spans are mixed
- [PASS] Single-column pages: all spans get `Some(0)`
- [PASS] Inter-column gaps: spans in gap get `None`
## Test Coverage
All acceptance criteria are covered by unit tests in `layout/columns.rs`:
1. `test_assign_columns_to_spans_two_column`: 2-column page, span at x0=50 -> Some(0), x0=350 -> Some(1), x0=310 (gap) -> None
2. `test_assign_columns_to_lines_unanimous`: All spans in same column -> that column
3. `test_assign_columns_to_lines_dominant`: >50% spans in one column -> that column
4. `test_assign_columns_to_lines_mixed`: 50/50 split -> None (no dominant)
5. `test_assign_columns_to_lines_full_width_heading`: All spans None -> line None
6. `test_assign_columns_to_spans_single_column`: Single-column page -> all spans Some(0)
7. `test_span_straddling_gap_assigned_by_x0`: Span assigned by x0 even if it extends into gap
8. `test_column_index_monotonic_left_to_right`: INV verified
## Critical Considerations
- INV: Column index monotonic left-to-right - verified in tests
- Span straddling gap: assigned by x0 - verified in test
- /Rotate normalized coords: assumed to be handled by upstream code
## Files Modified
- `crates/pdftract-core/src/hybrid.rs`: Added `column` field to `Span`
- `crates/pdftract-core/src/layout/columns.rs`: New module (360 lines)
- `crates/pdftract-core/src/layout/mod.rs`: Exported column types
- `crates/pdftract-cli/src/inspect/render/confidence_heatmap.rs`: Fixed test fixtures
- `crates/pdftract-cli/src/inspect/render/spans.rs`: Fixed test fixtures
## Gates Passed
- `cargo check --all-targets` - PASS (lib compiles)
- `cargo fmt --all` - PASS (code formatted)