- Add layout::line module with Line<S> struct for Phase 4.2 line formation - Implement compute_baseline() using plan formula: y0 + height * 0.2 - Add LineDirection enum with serde support (Ltr, Rtl, Mixed) - Add union_bboxes() helper for computing span bbox unions - Add HasBBox trait for generic span type support Acceptance criteria: - compute_baseline([0,100,50,110]) returns 102.0 (height 10) - compute_baseline([0,100,50,100]) returns 100.0 (zero height) - LineDirection serde roundtrips to "ltr"/"rtl"/"mixed" - All 11 unit tests pass Closes: pdftract-sdx9z
2.7 KiB
2.7 KiB
pdftract-sdx9z: Line struct + baseline computation per span
Summary
Implemented Phase 4.2 line formation infrastructure:
- Created
layout/line.rsmodule withLinestruct andcompute_baselinefunction - Added
LineDirectionenum with serde support (Ltr, Rtl, Mixed) - Added helper functions for bbox operations
Files Modified
crates/pdftract-core/src/layout/line.rs(new): 247 linescrates/pdftract-core/src/layout/mod.rs: Exported new line module
Acceptance Criteria Status
PASS
- ✅
compute_baseline([0, 100, 50, 110])returns102.0(height 10, baseline = 100 + 10*0.2) - ✅
compute_baseline([0, 100, 50, 100])returns100.0(zero-height case) - ✅
union_bboxescorrectly computes union of 3 span bboxes - ✅
LineDirectionserde roundtrips to/from "ltr"/"rtl"/"mixed" - ✅ All 11 unit tests pass in
layout::line::tests
Implementation Details
Line struct
pub struct Line<S> {
pub spans: Vec<S>,
pub bbox: [f32; 4], // Union of span bboxes
pub baseline: f32, // Average of member span baselines
pub direction: LineDirection,
pub page_relative_y: f32, // (page_height - bbox[3]) / page_height
}
compute_baseline function
pub fn compute_baseline(bbox: &[f32; 4]) -> f32 {
let height = bbox[3] - bbox[1];
bbox[1] + height * 0.2 // 0.2 = descender approximation
}
LineDirection enum
pub enum LineDirection {
Ltr, // Left-to-right
Rtl, // Right-to-left
Mixed, // Bidirectional
}
Plan References
- Phase 4.2 baseline (lines 1665-1666):
y0 + (bbox_height * 0.2)formula - RTL detection (line 1686):
unicode-bidicrate for future bidi character category lookup
Testing
All tests pass:
running 38 tests
test layout::line::tests::test_compute_baseline_normal_span ... ok
test layout::line::tests::test_compute_baseline_zero_height ... ok
test layout::line::tests::test_compute_baseline_large_height ... ok
test layout::line::tests::test_line_direction_serdes_ltr ... ok
test layout::line::tests::test_line_direction_serdes_rtl ... ok
test layout::line::tests::test_line_direction_serdes_mixed ... ok
test layout::line::tests::test_line_accessors ... ok
test layout::line::tests::test_union_bboxes_single ... ok
test layout::line::tests::test_union_bboxes_multiple ... ok
test layout::line::tests::test_union_bboxes_empty ... ok
test layout::line::tests::test_union_bboxes_nested ... ok
test layout::line::tests::test_union_bboxes_disjoint ... ok
test result: ok. 38 passed; 0 failed
Next Steps
Future beads will implement:
- Actual line clustering algorithm (baseline proximity grouping)
- RTL detection using
unicode-bidicrate - Span-to-line aggregation with reading order sorting