Commit graph

2 commits

Author SHA1 Message Date
jedarden
e6bf3dd290 feat(pdftract-3s2i): implement Phase 5.5.2 validation filter
Implement per-word validation filter for assisted-OCR BrokenVector path.

Changes:
- Add SpanSource::OcrAssisted variant to hybrid.rs
- Add Span::ocr_assisted() helper method
- Implement validate_ocr_with_position_hints() in ocr.rs
  - 5pt distance threshold for position validation
  - 0.4 confidence cap for rejected words
  - Linear scan for nearest-neighbor lookup
- Add unit tests for validation filter

Closes: pdftract-3s2i

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 04:57:17 -04:00
jedarden
c4ff5194dd feat(pdftract-67tm8): implement MCP stdio transport with integration tests
Implements the stdio transport for the MCP server, enabling communication
with local agents (Claude Desktop, Claude Code, Continue, Cursor) over
standard input/output with Content-Length framing.

Core features:
- LSP-style Content-Length framing with \r\n terminators
- JSON-RPC 2.0 message parsing and serialization
- INV-9 compliance: stdout contains only JSON-RPC frames
- Panic hook redirects panics to stderr
- SIGTERM handler for graceful shutdown
- Parse errors return -32700 with id: null, then continue

Acceptance criteria:
-  Piping tools/list with framing produces expected response < 50ms
-  EOF on stdin → clean exit within 100ms
-  Malformed JSON → -32700 error, subsequent requests work
-  No println!/log output to stdout (INV-9 enforced)
-  Panics go to stderr, no partial JSON on stdout
-  SIGTERM → exit 0, SIGINT → immediate non-zero exit

Tests added:
- crates/pdftract-cli/tests/mcp-stdio.rs (8 integration tests, all pass)
- All 49 existing unit tests continue to pass

Refs: pdftract-67tm8, plan Phase 6.7.2
2026-05-23 00:16:42 -04:00