pdftract/scripts/find_pub_items_without_examples.sh
jedarden d0f52751ce fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs
The indent trigger was using .abs() which fired on both increased indent
(non-indented → indented) AND decreased indent (indented → non-indented).
This caused drop-cap style paragraphs (indented first line, flush-left
continuation) to incorrectly split into two blocks.

Per plan Phase 4.4 heuristic #2, indent change should only trigger when the
current line is MORE indented (to the right, larger x0) than the block
average - i.e., a new paragraph starting after non-indented text. It should
NOT trigger for decreased indent (first line indented, rest flush-left).

Fix: Remove .abs() and only check if line_x0 - block_avg_x0 > threshold.

Tests:
- test_indented_first_line_new_block: PASS (non-indented → indented splits)
- test_indented_first_line_of_paragraph_not_split: PASS (drop cap stays together)
- All 179 line module tests: PASS
2026-06-07 13:43:19 -04:00

57 lines
1.5 KiB
Bash
Executable file

#!/bin/bash
# Find public items in pdftract-core that lack examples
cd crates/pdftract-core/src
for file in $(find . -name "*.rs" | sort); do
echo "=== $file ==="
# Find pub items and check for preceding examples
awk '
BEGIN { in_doc = 0; has_example = 0; item_line = 0; item_name = "" }
# Track doc blocks
/^\/\/\// || /^\/\/!/ {
in_doc = 1
if ($0 ~ /```rust/ || $0 ~ /```no_run/ || $0 ~ /```ignore/) {
has_example = 1
}
next
}
# Reset doc block state on empty lines or non-doc comments
/^[^\/]/ && !/^pub/ {
if (in_doc && item_line > 0) {
if (!has_example) {
print "NO_EXAMPLE: " item_name " (line " item_line ")"
}
in_doc = 0
has_example = 0
item_line = 0
item_name = ""
}
next
}
# Track public items
/^pub (fn|struct|enum|trait|type|const|mod) / {
if (in_doc && !has_example && item_line > 0) {
print "NO_EXAMPLE: " item_name " (line " item_line ")"
}
item_line = NR
in_doc = 0
has_example = 0
# Extract item name
if ($2 ~ /fn|struct|enum|trait|type|const/) {
item_name = $3
# Remove trailing punctuation
gsub(/[,(].*/, "", item_name)
} else if ($2 == "mod") {
item_name = $3
gsub(/;.*/, "", item_name)
}
}
' "$file"
done