Add path expansion module (expand.rs) with: - FileWorkItem and PathOrUrl types for work items - expand_paths() function for directory traversal via walkdir - Case-insensitive *.pdf filtering - Hidden directory skip (. prefix) - Remote URL support when feature enabled - bytes_total calculation for progress reporting Fix event.rs should_skip_confidence() for proper NaN handling. All 130 grep tests pass. See notes/pdftract-3gf5t.md for details.
1.9 KiB
1.9 KiB
pdftract-3gf5t: walkdir folder traversal + *.pdf filter + remote URL expansion
Summary
Implemented path expansion for the pdftract grep subcommand. This includes:
- FileWorkItem structure: Created
FileWorkItemandPathOrUrltypes to represent work items - Path expansion: Implemented
expand_paths()function that:- Expands local file paths (single files and directories)
- Walks directories via walkdir with *.pdf filtering (case-insensitive)
- Supports https:// URLs when the
remotefeature is enabled - Skips hidden directories (starting with .)
- Silently skips non-PDF files
- Calculates bytes_total for progress reporting
- Public API: Added
produce_work_items()function as the public entry point - Integration: Updated
run_grep()to use the new path expansion logic
Files Changed
crates/pdftract-cli/src/grep/expand.rs(new): Path expansion module with FileWorkItem, PathOrUrl, and expand_paths()crates/pdftract-cli/src/grep/mod.rs: Added expand module import and produce_work_items() functioncrates/pdftract-cli/src/grep/event.rs: Fixedshould_skip_confidence()function for proper NaN/Infinity handling in JSON serialization
Acceptance Criteria Status
- ✅ walkdir filters non-PDF files silently
- ✅ Single-file paths produce one FileWorkItem
- ✅ Mixed dir+file PATH list works
- ✅ https:// URL produces FileWorkItem when remote feature on; clap error when off
- ✅ Symlink loop does not hang (follow_links(false))
- ✅ bytes_total accurate sum
- ✅ Public produce_work_items(args: &GrepArgs) -> impl Iterator<Item = FileWorkItem>
Tests
All 130 grep-related tests pass with --features grep:
- expand.rs tests: 11/11 passed
- matcher.rs tests: 24/24 passed
- event.rs tests: 22/22 passed
- mod.rs tests: 53/53 passed
References
- Plan section: 7.8 line 2708 (path semantics), 2715 (-r recursive), 2793 (non-PDF silently skipped)
- Bead: pdftract-3gf5t