2.7 KiB
2.7 KiB
pdftract-1ofnz: RTL direction detection (unicode-bidi majority bidi class)
Summary
Implemented detect_line_direction(line_text) -> LineDirection function in crates/pdftract-core/src/layout/line.rs.
Implementation Details
Location: crates/pdftract-core/src/layout/line.rs:458-496
Algorithm:
- Walk each character in the text
- Count L (Left-to-Right) vs R/AL (Right-to-Left/Arabic Letter) using
unicode_bidi::bidi_class - All other bidi classes (EN, ES, ET, AN, CS, NSM, BN, B, S, WS, ON, etc.) are ignored per INV
- Return:
LineDirection::Ltrif LTR count > RTL count OR both counts are zero (empty/neutral-only)LineDirection::Rtlif RTL count > LTR countLineDirection::Mixedif counts are equal (and both > 0)
Key design decision: Empty strings and neutral-only text (digits, punctuation) default to Ltr per bead acceptance criteria.
Acceptance Criteria Status
| Criterion | Status | Notes |
|---|---|---|
| "Hello, World!" -> Ltr | PASS | Test: test_detect_line_direction_latin_text |
| "مرحبا بالعالم" -> Rtl | PASS | Test: test_detect_line_direction_arabic_text |
| Mixed Latin+Arabic: Mixed or dominant | PASS | Tests: test_detect_line_direction_mixed_latin_arabic, test_detect_line_direction_latin_more_than_arabic, test_detect_line_direction_arabic_more_than_latin |
| "123 456" digits only: Ltr default | PASS | Test: test_detect_line_direction_digits_only |
| "" -> Ltr | PASS | Test: test_detect_line_direction_empty_string |
Additional Test Coverage
test_detect_line_direction_punctuation_only: Punctuation-only text -> Ltrtest_detect_line_direction_latin_dominant: Latin with punctuation/digits -> Ltrtest_detect_line_direction_arabic_dominant: Arabic with digits -> Rtltest_detect_line_direction_hebrew_text: Hebrew text -> Rtltest_detect_line_direction_cyrillic_text: Cyrillic text -> Ltr
Tests Executed
cargo nextest run --package pdftract-core --lib 'layout::line::tests::test_detect_line_direction'
Result: 12/12 tests passed (all RTL direction detection tests) Module tests: 44/44 tests passed (entire line module)
Code Changes
Files modified:
crates/pdftract-core/src/layout/line.rs: Addeddetect_line_directionfunction with comprehensive documentation and testscrates/pdftract-core/src/layout/header_footer.rs: Fixed pre-existing compilation error (removed nonexistentreading_order_rankfield from test helper)
Commit: 4ab89e1 feat(pdftract-1ofnz): implement detect_line_direction with unicode-bidi
INV Compliance
- Numerals are bidi-neutral and do not drive direction
- Punctuation is neutral
- Empty lines default to Ltr
References
- Plan section: Phase 4.2 RTL detection (line 1668)