Commit graph

1 commit

Author SHA1 Message Date
jedarden
47df769e4b feat(pdftract-5ls35): implement JSON-Lines output sink for grep
Implement the --json output sink for pdftract grep with JSON-Lines
format (one match per line). Includes MatchEvent, FileOnlyEvent,
CountEvent structs and JsonSink line-buffered writer.

Key features:
- MatchEvent with all fields (path, page_index, bbox, match_text,
  span_text, span_confidence, pdf_fingerprint, crosses_spans)
- crosses_spans omitted when false via skip_serializing_if
- NaN/Infinity in span_confidence replaced with null
- page_index is 0-based (machine convention)
- FileOnlyEvent for -l mode, CountEvent for -c mode
- Line-buffered writes with immediate flush
- JSON schema at docs/schema/v1.0/grep-jsonl.schema.json

Closes: pdftract-5ls35
2026-05-25 02:05:17 -04:00