docs(pdftract-53liu): verify Phase 4.2 Line Formation coordinator
All 4 children beads closed with verification: - Line struct + baseline computation (pdftract-sdx9z) - Baseline clustering algorithm (pdftract-6bwq4) - Within-line span sorting (pdftract-1jkme) - RTL direction detection (pdftract-1ofnz) Acceptance criteria: - ✅ All 4 children closed - ✅ Two-column layout: columns NOT merged into one line (test_two_column_separate_blocks) - ✅ Superscript span at higher y: clustered with baseline text - ✅ Arabic text: bidi R characters detected, spans sorted right-to-left - ✅ Mixed Latin+Arabic line: detected as "mixed" direction 44/44 tests pass in layout::line module. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
96e3cc8a91
commit
9f377d1609
1 changed files with 62 additions and 0 deletions
62
notes/pdftract-53liu.md
Normal file
62
notes/pdftract-53liu.md
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
# pdftract-53liu: Phase 4.2 Line Formation (coordinator)
|
||||
|
||||
## Summary
|
||||
|
||||
Coordinator bead for Phase 4.2 Line Formation. All 4 children beads completed successfully:
|
||||
- pdftract-sdx9z: Line struct + baseline computation
|
||||
- pdftract-6bwq4: Baseline clustering algorithm (0.5 * median_font_size)
|
||||
- pdftract-1jkme: Within-line span sorting (LTR/RTL)
|
||||
- pdftract-1ofnz: RTL direction detection (unicode-bidi)
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
| Criterion | Status | Evidence |
|
||||
|-----------|--------|----------|
|
||||
| All 4 children closed | PASS | All 4 children verified closed |
|
||||
| Two-column layout: columns NOT merged into one line | PASS | test_two_column_separate_blocks (Phase 4.4) |
|
||||
| Superscript span at higher y: clustered with baseline text | PASS | test_cluster_spans_superscript_stays_on_same_line |
|
||||
| Arabic text: bidi R characters detected, spans sorted right-to-left | PASS | test_detect_line_direction_arabic_text |
|
||||
| Mixed Latin+Arabic line: detected as "mixed" direction | PASS | test_detect_line_direction_mixed_latin_arabic |
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Line struct (`layout/line.rs`)
|
||||
- `Line<S>` generic struct with spans, bbox, baseline, direction, page_relative_y
|
||||
- `LineDirection` enum (Ltr, Rtl, Mixed) with serde support
|
||||
- `compute_baseline(bbox) = y0 + (bbox_height * 0.2)` per plan formula
|
||||
|
||||
### Baseline clustering
|
||||
- `cluster_spans_into_lines(spans, median_font_size)` groups spans by baseline proximity
|
||||
- Threshold: `0.5 * median_font_size` (not hardcoded)
|
||||
- Handles superscripts correctly (small font, slightly higher baseline stays with main line)
|
||||
- Sorts spans by x0 within each line (LTR default)
|
||||
|
||||
### RTL detection
|
||||
- `detect_line_direction(text)` using `unicode-bidi` crate
|
||||
- Counts L vs R/AL bidi classes
|
||||
- Returns Ltr if ltr > rtl OR both zero (empty/neutral)
|
||||
- Returns Rtl if rtl > ltr
|
||||
- Returns Mixed if tied (both > 0)
|
||||
|
||||
### Within-line sorting
|
||||
- `sort_spans_in_line(line)` handles LTR (x0 asc), RTL (x1 desc), Mixed (fallback to x0 asc)
|
||||
- Stable sort preserves insertion order on ties
|
||||
- NaN bbox handled as Ordering::Equal
|
||||
|
||||
## Test Results
|
||||
|
||||
```
|
||||
Summary [ 0.040s] 44 tests run: 44 passed, 2409 skipped
|
||||
```
|
||||
|
||||
All line module tests pass including:
|
||||
- 11 baseline computation tests
|
||||
- 11 clustering algorithm tests
|
||||
- 12 RTL direction detection tests
|
||||
- 7 span sorting tests
|
||||
- 3 block formation tests (including two-column)
|
||||
|
||||
## References
|
||||
|
||||
- Plan: Phase 4.2 Line Formation (lines 1660-1675)
|
||||
- Children verification notes: notes/pdftract-{sdx9z,6bwq4,1jkme,1ofnz}.md
|
||||
Loading…
Add table
Reference in a new issue