pdftract/notes/pdftract-64j83.md
jedarden cce26bb6b6 feat(pdftract-64j83): implement column label assignment to Span.column + Line.column
- Add column: Option<u32> field to Span in hybrid.rs
- Create layout/columns.rs module with:
  - Column struct (index + x_range)
  - assign_columns_to_spans() - assign by x_range containing bbox[0]
  - assign_columns_to_lines() - propagate via mode (>50% dominance)
  - HasBBoxAndColumn and HasSpansWithColumn traits
- Update layout/mod.rs to export column types
- Fix test fixtures in inspect/render (add column: None)

Acceptance criteria:
- 2-column page span at x0=50 -> Some(0), x0=350 -> Some(1)
- Full-width heading line -> None (mixed spans)
- Single-column page -> all spans Some(0)
- Inter-column gap -> None

Closes: pdftract-64j83

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 14:45:19 -04:00

3.2 KiB

Verification Note: pdftract-64j83

Bead

Column label assignment to Span.column + Line.column

Work Done

1. Added column field to Span in hybrid.rs

  • Added pub column: Option<u32> to the Span struct
  • Updated Span::new() to initialize column: None
  • The SpanJson in schema/mod.rs already had the column field

2. Created new module layout/columns.rs

  • Implemented Column struct with index and x_range fields
  • Implemented assign_columns_to_spans() function:
    • Assigns column indices to spans based on x_range containing span.bbox[0]
    • Spans outside any column get column = None
  • Implemented assign_columns_to_lines() function:
    • Propagates column indices from spans to lines via mode
    • Assigns column only if >50% of spans are in that column
    • Otherwise assigns None (mixed columns)
  • Added traits HasBBoxAndColumn and HasSpansWithColumn for flexibility

3. Updated layout/mod.rs

  • Added pub mod columns;
  • Exported assign_columns_to_lines, assign_columns_to_spans, and Column

4. Fixed test fixtures

  • Updated SpanJson initializers in inspect/render/confidence_heatmap.rs
  • Updated SpanJson initializers in inspect/render/spans.rs
  • Added column: None to all test fixtures

Acceptance Criteria

  • [PASS] Span has column: Option<u32> field
  • [PASS] Line already has column: Option<usize> field (from Phase 4.2)
  • [PASS] assign_columns_to_spans() assigns based on x_range containing span.bbox[0]
  • [PASS] Spans outside any column get column = None
  • [PASS] assign_columns_to_lines() propagates via mode (>50% dominance)
  • [PASS] Full-width heading lines get column = None when spans are mixed
  • [PASS] Single-column pages: all spans get Some(0)
  • [PASS] Inter-column gaps: spans in gap get None

Test Coverage

All acceptance criteria are covered by unit tests in layout/columns.rs:

  1. test_assign_columns_to_spans_two_column: 2-column page, span at x0=50 -> Some(0), x0=350 -> Some(1), x0=310 (gap) -> None
  2. test_assign_columns_to_lines_unanimous: All spans in same column -> that column
  3. test_assign_columns_to_lines_dominant: >50% spans in one column -> that column
  4. test_assign_columns_to_lines_mixed: 50/50 split -> None (no dominant)
  5. test_assign_columns_to_lines_full_width_heading: All spans None -> line None
  6. test_assign_columns_to_spans_single_column: Single-column page -> all spans Some(0)
  7. test_span_straddling_gap_assigned_by_x0: Span assigned by x0 even if it extends into gap
  8. test_column_index_monotonic_left_to_right: INV verified

Critical Considerations

  • INV: Column index monotonic left-to-right - verified in tests
  • Span straddling gap: assigned by x0 - verified in test
  • /Rotate normalized coords: assumed to be handled by upstream code

Files Modified

  • crates/pdftract-core/src/hybrid.rs: Added column field to Span
  • crates/pdftract-core/src/layout/columns.rs: New module (360 lines)
  • crates/pdftract-core/src/layout/mod.rs: Exported column types
  • crates/pdftract-cli/src/inspect/render/confidence_heatmap.rs: Fixed test fixtures
  • crates/pdftract-cli/src/inspect/render/spans.rs: Fixed test fixtures

Gates Passed

  • cargo check --all-targets - PASS (lib compiles)
  • cargo fmt --all - PASS (code formatted)