Commit graph

2 commits

Author SHA1 Message Date
jedarden
e6bf3dd290 feat(pdftract-3s2i): implement Phase 5.5.2 validation filter
Implement per-word validation filter for assisted-OCR BrokenVector path.

Changes:
- Add SpanSource::OcrAssisted variant to hybrid.rs
- Add Span::ocr_assisted() helper method
- Implement validate_ocr_with_position_hints() in ocr.rs
  - 5pt distance threshold for position validation
  - 0.4 confidence cap for rejected words
  - Linear scan for nearest-neighbor lookup
- Add unit tests for validation filter

Closes: pdftract-3s2i

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 04:57:17 -04:00
jedarden
7eed5ca55a feat(pdftract-24kut): enforce MCP transport mutual exclusion at CLI parse
Per ADR-006: stdio and HTTP transports are mutually exclusive because they
have opposite stdout discipline (stdio: JSON-RPC sink; HTTP: log channel).

Changes:
- Add clap ArgGroup with multiple(false) to enforce --stdio XOR --bind
- Default to stdio mode when neither flag is specified
- Change --bind from required String to Option<String>
- Add ADR-006 reference to help text and doc comments
- Add unit tests for CLI argument validation

Acceptance criteria:
- pdftract mcp → launches in stdio mode (default)
- pdftract mcp --stdio → launches in stdio mode
- pdftract mcp --bind ADDR → launches in HTTP+SSE mode
- pdftract mcp --stdio --bind ADDR → exits 2 with clap conflict error
- pdftract mcp --help shows mutual exclusivity note
- Unit test verifies ArgGroup conflict on dual-transport invocation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 00:41:47 -04:00