The indent trigger was using .abs() which fired on both increased indent (non-indented → indented) AND decreased indent (indented → non-indented). This caused drop-cap style paragraphs (indented first line, flush-left continuation) to incorrectly split into two blocks. Per plan Phase 4.4 heuristic #2, indent change should only trigger when the current line is MORE indented (to the right, larger x0) than the block average - i.e., a new paragraph starting after non-indented text. It should NOT trigger for decreased indent (first line indented, rest flush-left). Fix: Remove .abs() and only check if line_x0 - block_avg_x0 > threshold. Tests: - test_indented_first_line_new_block: PASS (non-indented → indented splits) - test_indented_first_line_of_paragraph_not_split: PASS (drop cap stays together) - All 179 line module tests: PASS
16 lines
614 B
Rust
16 lines
614 B
Rust
// Quick test to understand serialization format
|
|
use pdftract_core::fingerprint::canonicalize::serialize_dict_canonical;
|
|
use pdftract_core::parser::object::{PdfDict, PdfObject};
|
|
use std::sync::Arc;
|
|
|
|
#[test]
|
|
fn debug_serialization() {
|
|
let mut dict = PdfDict::new();
|
|
dict.insert(Arc::from("/Z"), PdfObject::Integer(3));
|
|
dict.insert(Arc::from("/A"), PdfObject::Integer(1));
|
|
dict.insert(Arc::from("/M"), PdfObject::Integer(2));
|
|
|
|
let bytes = serialize_dict_canonical(&dict);
|
|
println!("serialize_dict_canonical output: {}", String::from_utf8_lossy(&bytes));
|
|
println!("bytes: {:?}", bytes);
|
|
}
|