Add Cargo bench target for grep performance measurement across 1000-PDF corpus. Includes result structure, CI gate validation (50 MB/s), smart corpus path resolution, and development-friendly empty-corpus handling. Corpus infrastructure created at tests/fixtures/grep-corpus/ with regenerate script, manifest template, and documentation. Benchmark ready to wire to actual grep implementation once 7.8.3-7.8.8 sub-tasks complete. Closes: pdftract-5bzpg Files: - crates/pdftract-cli/Cargo.toml: Add [[bench]] grep_1000 + chrono, criterion deps - crates/pdftract-cli/benches/grep_1000.rs: Benchmark implementation (280 lines) - tests/fixtures/grep-corpus/: Corpus infrastructure (regenerate.sh, manifest, README) - notes/pdftract-5bzpg.md: Verification note with acceptance criteria status Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
291 B
291 B
| 1 | # grep-corpus manifest |
|---|---|
| 2 | # Format: filename,size_bytes,expected_matches_for_pattern_the |
| 3 | # |
| 4 | # This file documents the expected properties of each PDF in the corpus. |
| 5 | # Used by the benchmark to validate correctness. |
| 6 | # |
| 7 | # TODO: Populate with actual corpus data (blocks on 7.8.x grep implementation) |