pdftract/crates/pdftract-core/tests
jedarden d0f52751ce fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs
The indent trigger was using .abs() which fired on both increased indent
(non-indented → indented) AND decreased indent (indented → non-indented).
This caused drop-cap style paragraphs (indented first line, flush-left
continuation) to incorrectly split into two blocks.

Per plan Phase 4.4 heuristic #2, indent change should only trigger when the
current line is MORE indented (to the right, larger x0) than the block
average - i.e., a new paragraph starting after non-indented text. It should
NOT trigger for decreased indent (first line indented, rest flush-left).

Fix: Remove .abs() and only check if line_x0 - block_avg_x0 > threshold.

Tests:
- test_indented_first_line_new_block: PASS (non-indented → indented splits)
- test_indented_first_line_of_paragraph_not_split: PASS (drop cap stays together)
- All 179 line module tests: PASS
2026-06-07 13:43:19 -04:00
..
document_model fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
fixtures feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing 2026-06-01 10:27:03 -04:00
object_parser/fixtures test(pdftract-4fa9): object parser fixture corpus + proptest harness + critical-test suite 2026-06-01 17:30:29 -04:00
remote/fixtures fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
sdk-conformance/fixtures fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
acceptance_crit_verification.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
cjk_encoding.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
classifier_corpus.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
conformance.rs fix(bf-4mkhv): clean up unused imports in hash.rs 2026-06-01 09:43:48 -04:00
debug_content_streams.rs feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing 2026-06-01 10:27:03 -04:00
debug_fingerprint.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
debug_fingerprint_fixtures.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
debug_page_parsing.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
debug_serialization.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
document_model.rs fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
encoding_recovery.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
encryption_aes_128_test.rs fix(pdftract-495uv): AES-128 test buffer allocation for PKCS#7 padding 2026-05-28 01:56:26 -04:00
encryption_aes_256_test.rs feat(pdftract-1z0qt): implement encryption detection + RC4/AES-128/AES-256 decryption 2026-05-28 03:22:36 -04:00
encryption_integration_tests.rs fix(pdftract-4pnmd): build.rs doc comment format string parsing 2026-05-28 14:36:45 -04:00
encryption_rc4_test.rs test(pdftract-4isj9): add RC4 encryption integration tests 2026-05-26 20:26:52 -04:00
error_recovery_integration.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
fingerprint_debug_content_edit.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
fingerprint_reproducibility.rs fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
generate_document_model_golden.rs fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
hint_stream_integration.rs fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
http_range_integration.rs chore(pdftract-36glh): remove unused JpxDecoder import and add verification note 2026-05-28 05:23:13 -04:00
json_schema.rs docs(pdftract-3eohy): add rustdoc examples to Glyph and Span types 2026-06-01 01:16:24 -04:00
memory_guard.rs feat(bf-2ervu): implement mmap-backed PdfSource via memmap2 2026-05-24 08:40:11 -04:00
memory_guard_tests.rs feat(bf-2ervu): implement mmap-backed PdfSource via memmap2 2026-05-24 08:40:11 -04:00
object_parser.rs test(pdftract-4fa9): object parser fixture corpus + proptest harness + critical-test suite 2026-06-01 17:30:29 -04:00
object_parser_proptest.proptest-regressions test(pdftract-4fa9): object parser fixture corpus + proptest harness + critical-test suite 2026-06-01 17:30:29 -04:00
object_parser_proptest.rs test(pdftract-4fa9): object parser fixture corpus + proptest harness + critical-test suite 2026-06-01 17:30:29 -04:00
ocr_integration.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
page_classification.rs docs(pdftract-3eohy): add rustdoc examples to Glyph and Span types 2026-06-01 01:16:24 -04:00
remote_fetch_integration.rs wip: AcroForm improvements, debug tooling, test corpus, and fixture updates 2026-05-30 09:48:14 -04:00
remote_fetch_sequence.rs fix(bf-4mkhv): clean up unused imports in hash.rs 2026-06-01 09:43:48 -04:00
remote_forward_scan_disable.rs wip: AcroForm improvements, debug tooling, test corpus, and fixture updates 2026-05-30 09:48:14 -04:00
remote_http_source_tests.rs fix(pyo3): correct extract_text_fn call in extract_markdown stub 2026-05-28 20:28:25 -04:00
remote_integration.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
remote_mock_server_tests.rs feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing 2026-06-01 10:27:03 -04:00
remote_tls_tests.rs wip: intermediate state from previous work 2026-05-29 08:25:23 -04:00
schema_validate_fixtures.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
stream_decoder_fixtures.rs fix(pdftract-25igv): fix emit! macro usage in codespace parser 2026-05-28 07:29:33 -04:00
struct_tree_coverage.rs feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing 2026-06-01 10:27:03 -04:00
test_416_debug.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
test_basic_extraction.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
test_cycle_detection.rs feat(pdftract-2m3gl): implement PHP SDK with Packagist publishing 2026-06-01 10:27:03 -04:00
test_decoder_debug.rs wip: intermediate state from previous work 2026-05-29 08:25:23 -04:00
test_filter_array_debug.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
test_fixture_read.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
test_lzw_debug.rs feat(pdftract-91e1i): HTTP fetch sequence implementation 2026-05-28 13:17:00 -04:00
test_sdk_extraction_simple.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
test_xref_debug.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
TH-01-stream-bomb.rs feat(pdftract-3h9xo): implement threads JSON output + schema integration 2026-05-25 13:40:15 -04:00
TH-03-mcp-no-auth.rs test(pdftract-5m3hp): implement TH-03 MCP no-auth bind security tests 2026-05-24 18:43:52 -04:00
TH-04-js-presence.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
TH-07-ps-leak.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
TH-10-cache-poison.rs feat(pdftract-2okbq): implement TH-10 cache poisoning protection 2026-05-26 21:09:54 -04:00
th06_checksum_test.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
th_05_ssrf_block.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
verify_proptest_catches_bugs.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
xref_helpers.rs feat(bf-2ervu): implement mmap-backed PdfSource via memmap2 2026-05-24 08:40:11 -04:00
xref_integration_test.rs feat(bf-2ervu): implement mmap-backed PdfSource via memmap2 2026-05-24 08:40:11 -04:00