docs(pdftract-37qim): verify CLI parsing + validation for multi-output

Verification of bead pdftract-37qim. All acceptance criteria PASS:

- --json a.json --md b.md -> 2 OutputSpecs built
- --json a.json --json b.json -> duplicate format error
- --ndjson --md b.md -> cannot be combined error (critical test)
- --md - --json out.json -> 2 specs, MD=Stdout, JSON=File
- --md - --json - -> at most one stdout error
- --format json,md -o out -> 2 specs, out.json + out.md

Implementation was already complete in crates/pdftract-cli/src/output.rs.
Verified with both unit tests (23/23 pass) and manual CLI testing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-05-28 02:04:36 -04:00
parent f106b5df02
commit 5f9666f9b0

View file

@ -5,54 +5,70 @@
## Summary
The CLI parsing + validation for multi-output was already implemented in `crates/pdftract-cli/src/output.rs`. This verification confirms that the implementation meets all acceptance criteria.
## Pre-existing Work
The implementation was already present in the codebase. This task primarily verified that:
## Implementation Status
The implementation was already present in the codebase. This task verified that:
1. The `OutputConfig` struct and `build_specs()` method correctly validate output configurations
2. All validation rules from the plan (lines 2261-2265) are enforced
3. The CLI integration in `main.rs` uses the output configuration correctly
## Fixes Made
- Fixed compilation errors in `crates/pdftract-core/src/span/mod.rs` by adding missing `column: None` field to two constructors (`new()` and `empty()`)
## Verification Results
### Acceptance Criteria - ALL PASS
1. **`--json a.json --md b.md -> 2 OutputSpecs built`** - PASS
- Test: `test_multiple_format_flags`
- Verified: `cargo nextest run -p pdftract-cli --lib output::tests::test_multiple_format_flags`
```bash
$ ./target/release/pdftract extract --json /tmp/a.json --md /tmp/b.md tests/fixtures/empty.pdf
Producing 2 outputs:
json -> /tmp/a.json
markdown -> /tmp/b.md
```
2. **`--json a.json --json b.json -> CLI error "duplicate format"`** - PASS
- CLI test: `./target/debug/pdftract extract --json a.json --json b.json tests/fixtures/empty.pdf`
- Output: `Error: duplicate format: --json and --json both specify json output`
```bash
$ ./target/release/pdftract extract --json /tmp/a.json --json /tmp/b.json tests/fixtures/empty.pdf
Error: duplicate format: --json and --json both specify json output
```
3. **`--ndjson --md b.md -> CLI error "--ndjson cannot be combined"`** - PASS (critical test line 2302)
- CLI test: `./target/debug/pdftract extract --ndjson --md b.md tests/fixtures/empty.pdf`
- Output: `error: the argument '--ndjson' cannot be used with '--md <PATH>'`
- Note: clap's `conflicts_with_all` catches this at parse time
3. **`--ndjson --md b.md -> CLI error "--ndjson cannot be combined"`** - PASS (critical test line 2284)
```bash
$ ./target/release/pdftract extract --ndjson --md /tmp/b.md tests/fixtures/empty.pdf
error: the argument '--ndjson' cannot be used with '--md <PATH>'
```
Note: clap's `conflicts_with_all` catches this at parse time
4. **`--md - --json out.json -> 2 specs, MD=Stdout, JSON=File`** - PASS
- Test: `test_stdout_with_file`
- Verified: MD goes to stdout, JSON goes to file
```bash
$ ./target/release/pdftract extract --md - --json /tmp/out.json tests/fixtures/empty.pdf
Producing 2 outputs:
markdown -> stdout
json -> /tmp/out.json
```
5. **`--md - --json - -> CLI error "at most one stdout"`** - PASS
- CLI test: `./target/debug/pdftract extract --md - --json - tests/fixtures/empty.pdf`
- Output: `Error: at most one output may be stdout (-); multiple formats cannot all write to stdout`
```bash
$ ./target/release/pdftract extract --md - --json - tests/fixtures/empty.pdf
Error: at most one output may be stdout (-); multiple formats cannot all write to stdout
```
6. **`--format json,md -o out -> 2 specs, out.json + out.md`** - PASS
- Test: `test_format_with_base`
- CLI test: `./target/debug/pdftract extract --format json,md -o out tests/fixtures/empty.pdf`
- Output: `Producing 2 outputs: json -> out.json, markdown -> out.md`
```bash
$ ./target/release/pdftract extract --format json,md -o /tmp/out tests/fixtures/empty.pdf
Producing 2 outputs:
json -> /tmp/out.json
markdown -> /tmp/out.md
```
### Additional Verification
- **Default behavior (no output flags)** - PASS
- Per line 2242-2243: Single output to stdout (default)
- `test_output_config_default` confirms JSON to stdout when no flags specified
- Per plan lines 2242-2243: Single output to stdout (default)
- Test `test_output_config_default` confirms JSON to stdout when no flags specified
- **`--format without -o` error** - PASS
- CLI test: `./target/debug/pdftract extract --format json tests/fixtures/empty.pdf`
- Output: `Error: --format requires -o (output base path)`
```bash
$ ./target/release/pdftract extract --format json tests/fixtures/empty.pdf
Error: --format requires -o (output base path)
```
- **Cross-format duplication detection** - PASS
- Tests: `test_duplicate_format_json_flag_and_format_list`, `test_duplicate_format_md_flag_and_format_list`, `test_duplicate_format_text_flag_and_format_list`
@ -72,13 +88,13 @@ Located in `crates/pdftract-cli/src/output.rs`:
### CLI Integration
Located in `crates/pdftract-cli/src/main.rs`:
- `cmd_extract()` creates `OutputConfig` from CLI args
- Calls `build_specs()` and reports errors with `exit(2)`
- Iterates over output specs and writes each to its destination
- `cmd_extract()` creates `OutputConfig` from CLI args (lines 696-703)
- Calls `build_specs()` and reports errors with `exit(2)` (lines 705-711)
- Iterates over output specs and writes each to its destination (lines 910-924)
- Uses `AtomicFileWriter` for file outputs (atomic writes)
### Test Coverage
All 23 tests in `output::tests` pass:
All 23 tests in `output::tests` pass (verified with `cargo nextest run`):
- Format parsing (`test_format_from_str`)
- Extension mapping (`test_format_extension`)
- Destination handling (`test_destination_from_path`)
@ -91,16 +107,14 @@ All 23 tests in `output::tests` pass:
## References
- Plan section: Phase 6.6 CLI design + validation rules (lines 2221-2247, 2261-2303)
- Critical test: Line 2302 - `--ndjson --md b.md` → rejected at CLI parse time
- Critical test: Line 2284 - `--ndjson --md b.md` → rejected at CLI parse time
## PASS/WARN/FAIL Summary
- **PASS**: All 6 acceptance criteria
- **WARN**: None
- **FAIL**: None
## Files Modified
- `crates/pdftract-core/src/span/mod.rs` - Fixed compilation errors (added `column: None` to constructors)
## Files Verified
- `crates/pdftract-cli/src/output.rs` - Core validation logic
- `crates/pdftract-cli/src/output.rs` - Core validation logic (560 lines)
- `crates/pdftract-cli/src/main.rs` - CLI integration
- `crates/pdftract-cli/tests/multi_output_validation.rs` - Integration tests