pdftract/notes/pdftract-sdx9z.md
jedarden 2cf02c6b2b feat(pdftract-sdx9z): implement Line struct and baseline computation
- Add layout::line module with Line<S> struct for Phase 4.2 line formation
- Implement compute_baseline() using plan formula: y0 + height * 0.2
- Add LineDirection enum with serde support (Ltr, Rtl, Mixed)
- Add union_bboxes() helper for computing span bbox unions
- Add HasBBox trait for generic span type support

Acceptance criteria:
- compute_baseline([0,100,50,110]) returns 102.0 (height 10)
- compute_baseline([0,100,50,100]) returns 100.0 (zero height)
- LineDirection serde roundtrips to "ltr"/"rtl"/"mixed"
- All 11 unit tests pass

Closes: pdftract-sdx9z
2026-05-24 02:54:00 -04:00

85 lines
2.7 KiB
Markdown

# pdftract-sdx9z: Line struct + baseline computation per span
## Summary
Implemented Phase 4.2 line formation infrastructure:
- Created `layout/line.rs` module with `Line` struct and `compute_baseline` function
- Added `LineDirection` enum with serde support (Ltr, Rtl, Mixed)
- Added helper functions for bbox operations
## Files Modified
- `crates/pdftract-core/src/layout/line.rs` (new): 247 lines
- `crates/pdftract-core/src/layout/mod.rs`: Exported new line module
## Acceptance Criteria Status
### PASS
-`compute_baseline([0, 100, 50, 110])` returns `102.0` (height 10, baseline = 100 + 10*0.2)
-`compute_baseline([0, 100, 50, 100])` returns `100.0` (zero-height case)
-`union_bboxes` correctly computes union of 3 span bboxes
-`LineDirection` serde roundtrips to/from "ltr"/"rtl"/"mixed"
- ✅ All 11 unit tests pass in `layout::line::tests`
## Implementation Details
### `Line` struct
```rust
pub struct Line<S> {
pub spans: Vec<S>,
pub bbox: [f32; 4], // Union of span bboxes
pub baseline: f32, // Average of member span baselines
pub direction: LineDirection,
pub page_relative_y: f32, // (page_height - bbox[3]) / page_height
}
```
### `compute_baseline` function
```rust
pub fn compute_baseline(bbox: &[f32; 4]) -> f32 {
let height = bbox[3] - bbox[1];
bbox[1] + height * 0.2 // 0.2 = descender approximation
}
```
### `LineDirection` enum
```rust
pub enum LineDirection {
Ltr, // Left-to-right
Rtl, // Right-to-left
Mixed, // Bidirectional
}
```
## Plan References
- Phase 4.2 baseline (lines 1665-1666): `y0 + (bbox_height * 0.2)` formula
- RTL detection (line 1686): `unicode-bidi` crate for future bidi character category lookup
## Testing
All tests pass:
```
running 38 tests
test layout::line::tests::test_compute_baseline_normal_span ... ok
test layout::line::tests::test_compute_baseline_zero_height ... ok
test layout::line::tests::test_compute_baseline_large_height ... ok
test layout::line::tests::test_line_direction_serdes_ltr ... ok
test layout::line::tests::test_line_direction_serdes_rtl ... ok
test layout::line::tests::test_line_direction_serdes_mixed ... ok
test layout::line::tests::test_line_accessors ... ok
test layout::line::tests::test_union_bboxes_single ... ok
test layout::line::tests::test_union_bboxes_multiple ... ok
test layout::line::tests::test_union_bboxes_empty ... ok
test layout::line::tests::test_union_bboxes_nested ... ok
test layout::line::tests::test_union_bboxes_disjoint ... ok
test result: ok. 38 passed; 0 failed
```
## Next Steps
Future beads will implement:
- Actual line clustering algorithm (baseline proximity grouping)
- RTL detection using `unicode-bidi` crate
- Span-to-line aggregation with reading order sorting