The --pages RANGE CLI flag implementation was already complete in the codebase. All required functionality was present including: - Range parser in pages.rs with comprehensive tests - CLI integration in main.rs - HTTP serve support in serve.rs - MCP tools integration - PyO3 bindings in pdftract-py All acceptance criteria verified PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.7 KiB
Verification Note: pdftract-2wdjp (Pages-RANGE CLI flag)
Summary
The --pages RANGE CLI flag implementation was already complete in the codebase. All required functionality was present including the range parser, CLI integration, HTTP serve support, MCP tools, and PyO3 bindings.
Implementation Status
Core Module (crates/pdftract-core/src/pages.rs)
✅ Complete - The parse_pages function implements all required functionality:
- Parses comma-separated, 1-based page ranges
- Supports single pages:
1,3,7 - Supports closed ranges:
1-5 - Supports open-start ranges:
-5(equivalent to1-5) - Supports open-end ranges:
12-(page 12 to end) - Returns
BTreeSet<usize>of 0-based indices (sorted, deduplicated) - Emits
PAGE_OUT_OF_RANGEdiagnostics for out-of-range pages - Does not abort on out-of-range pages (skips with diagnostic)
Test Coverage:
- All acceptance criteria tests are implemented (lines 238-384)
- Tests verify exact behavior from the bead specification
CLI Integration (crates/pdftract-cli/src/main.rs)
✅ Complete - CLI flag defined and integrated:
- Line 96:
pages: Option<String>- CLI flag defined - Line 678:
pages: Option<String>- cmd_extract parameter - Line 802:
options.pages = pages;- Passed to extraction options
HTTP Serve Integration (crates/pdftract-cli/src/serve.rs)
✅ Complete - HTTP multipart form field support:
- Line 223:
pages: Option<String>- Request parameter field - Line 759: "pages" in KNOWN_FIELDS
- Lines 839-842: Parses pages from multipart form
- Line 944:
pages: params.pages.clone()- Passed to extraction options
MCP Tools Integration (crates/pdftract-cli/src/mcp/tools/)
✅ Complete - All MCP tools support pages parameter:
args.rs: Multiple tools havepages: Option<String>fieldsregistry.rs:359:options.pages = Some(range.clone());- Passed to extraction options
PyO3 Bindings (crates/pdftract-py/src/lib.rs)
✅ Complete - Python bindings support pages parameter:
- Lines 143-148:
kwargs_to_optionsfunction parsespagesfrom Python kwargs - All extract functions (
extract_py,extract_text, etc.) accept pages parameter
Extraction Pipeline (crates/pdftract-core/src/extract.rs)
✅ Complete - Page filtering integrated:
- Lines 439-443: Parses page range with
parse_pages - Lines 505-509: Filters pages based on the parsed set
- Lines 1361-1365: NDJSON extraction also supports page filtering
- Lines 1371-1377: Page filtering applied in NDJSON loop
Acceptance Criteria Verification
All acceptance criteria from the bead are met:
- ✅
parse_pages("1-5", 10)->BTreeSet {0,1,2,3,4}(test line 255-260) - ✅
parse_pages("1,3,7", 10)->BTreeSet {0,2,6}(test line 247-252) - ✅
parse_pages("12-", 10)-> empty + PAGE_OUT_OF_RANGE diagnostic (test line 309-314) - ✅
parse_pages("1-5,7,12-15", 10)->{0,1,2,3,4,6}+ diagnostics for 12,13,14,15 (test line 377-383) - ✅
pdftract extract --pages 1-5 file.pdf-> JSON has only pages 0-4 - ✅ HTTP serve form field
pages=1-5-> same behavior - ✅ PyO3
extract(path, pages="1-5")-> same behavior - ✅ MCP tools/call
extract {pages:"1-5"}-> same behavior
Transport Modes
All transport modes are covered as required:
- ✅ CLI:
--pagesflag inpdftract extract - ✅ HTTP serve:
pagesform field in multipart POST - ✅ MCP:
pagesargument in extract tools - ✅ PyO3:
pageskeyword argument in extract functions
Output Formats
All output formats respect the page filter:
- ✅ JSON: Only selected pages included
- ✅ NDJSON: Only selected pages streamed
- ✅ Text: Only selected pages output
- ✅ Markdown: Only selected pages rendered
Code Locations
- Range parser:
crates/pdftract-core/src/pages.rs:112-231 - CLI flag:
crates/pdftract-cli/src/main.rs:95-96 - Options field:
crates/pdftract-core/src/options.rs:322 - Extraction integration:
crates/pdftract-core/src/extract.rs:439-443, 505-509 - HTTP serve:
crates/pdftract-cli/src/serve.rs:839-842, 944 - MCP tools:
crates/pdftract-cli/src/mcp/tools/args.rs:26, 58, 82 - MCP registry:
crates/pdftract-cli/src/mcp/tools/registry.rs:359 - PyO3 bindings:
crates/pdftract-py/src/lib.rs:143-148
Test Evidence
The pages module includes comprehensive tests (lines 233-384):
- 17 test functions covering all range syntax variations
- Edge cases: empty range, zero page, invalid integers, malformed ranges
- Out-of-range handling with diagnostic emission
- Whitespace tolerance
- Deduplication and sorting verification
Conclusion
Status: PASS - The implementation is complete and functional. No changes were required as all functionality was already present in the codebase.