jedarden
e2c1e2817b
feat(pdftract-2i6rt): implement cache CLI subcommand and HTTP integration
...
This commit implements Phase 6.9.6: surfacing the cache as user-visible
CLI and HTTP affordances.
## Changes
- Add `pdftract cache` subcommand with stats/clear/purge actions
- `stats DIR`: show entry count, size, hit ratio, age distribution
- `stats DIR --json`: emit JSON with same fields
- `clear DIR`: delete all entries (preserves index.json/sentinel)
- `purge DIR --older-than 30d`: delete entries older than duration
- `purge DIR --version '<1.0.0'`: version constraint purge (stub)
- Add global flags to extract-style subcommands
- `--cache-dir DIR`: enable cache at directory
- `--cache-size SIZE`: set LRU size limit (default 1 GiB)
- `--no-cache`: disable cache for this call
- Add `X-Pdftract-Cache: hit|miss|skipped` HTTP header on /extract endpoints
- Set in response headers before body streaming
- Add JSON metadata fields
- `metadata.cache_status`: "hit" | "miss" | "skipped"
- `metadata.cache_age_seconds`: integer seconds (present only on hit)
## Acceptance Criteria
- ✅ pdftract cache stats on empty dir: "Entries: 0"
- ✅ pdftract cache stats on populated dir: correct counts and ratios
- ✅ pdftract cache clear -y: deletes entries, preserves index/sentinel
- ✅ pdftract cache purge --older-than: deletes old entries
- ✅ extract --cache-dir: metadata.cache_status populated
- ✅ extract second run: cache_status "hit" with age
- ✅ extract --no-cache: cache_status "skipped"
- ✅ HTTP serve: X-Pdftract-Cache header present
- ✅ --cache-size parsing: 4GiB → 4 * 1024^3 bytes
## Modules
- crates/pdftract-cli/src/cache_cmd.rs: subcommand implementation
- crates/pdftract-cli/src/serve.rs: HTTP handler integration
- crates/pdftract-cli/src/main.rs: CLI flag definitions
- crates/pdftract-core/src/cache/mod.rs: extract_with_cache() integration
- crates/pdftract-core/src/extract.rs: cache_status metadata fields
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:33:43 -04:00