Phase 7.7.3: Add threads field to ExtractionResult with ThreadJson schema integration. Changes: - Added ThreadJson and BeadJson structs to schema/mod.rs - Added thread_to_json() function to threads/mod.rs - Added build_page_ref_to_index() helper to parser/pages.rs - Added threads field to ExtractionResult in extract.rs - Implemented Phase 7.7 extraction logic with discover_threads/walk_beads - Added threads_to_markdown() and collapse_page_ranges() to markdown.rs - Updated JSON schema with ThreadJson and BeadJson definitions - Added thread_to_py() and bead_to_py() conversions in pdftract-py - Exported ThreadJson, BeadJson from lib.rs All 32 threads module tests pass. All 35 markdown tests pass. Verification: notes/pdftract-3h9xo.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
76 lines
2.8 KiB
Markdown
76 lines
2.8 KiB
Markdown
# pdftract-2u6q2: Diagnostic Infrastructure
|
|
|
|
## Summary
|
|
|
|
Implemented the diagnostic emission infrastructure as specified in bead pdftract-2u6q2.
|
|
|
|
## Changes Made
|
|
|
|
### 1. DiagnosticsCollector Type
|
|
- **File**: `crates/pdftract-core/src/diagnostics.rs`
|
|
- Added thread-safe `DiagnosticsCollector` backed by `Arc<Mutex<Vec<Diagnostic>>>`
|
|
- Methods:
|
|
- `emit(code)` - emit diagnostic with default message
|
|
- `emit_with_offset(code, offset)` - emit with byte offset
|
|
- `emit_with_message(code, message)` - emit with custom message
|
|
- `into_vec()` - consume and return collected diagnostics
|
|
- `get()` - get reference to collected diagnostics
|
|
- `len()` / `is_empty()` - query collector state
|
|
|
|
### 2. DiagnosticJson hint Field
|
|
- **File**: `crates/pdftract-core/src/schema/mod.rs`
|
|
- Added `hint: Option<String>` field to `DiagnosticJson` struct
|
|
- Updated all construction sites to include `hint: None`
|
|
- Field is skipped in JSON serialization when `None`
|
|
|
|
### 3. Missing Error Codes
|
|
- **File**: `crates/pdftract-core/src/diagnostics.rs`
|
|
- Added `DiagCode::ImgSourceMixed` (IMG_SOURCE_MIXED)
|
|
- Added `DiagCode::ProfileInvalid` (PROFILE_INVALID)
|
|
- Added `DiagCode::RepairRescuedFromBackwardsXref` (REPAIR_RESCUED_FROM_BACKWARDS_XREF)
|
|
- Updated `category()`, `name()`, `severity()` mappings
|
|
- Added catalog entries to `DIAGNOSTIC_CATALOG`
|
|
|
|
### 4. Diagnostics Documentation
|
|
- **File**: `docs/integrations/diagnostics-codes.md` (new)
|
|
- Comprehensive catalog of all diagnostic codes
|
|
- Organized by category (STRUCT_*, STREAM_*, XREF_*, etc.)
|
|
- Includes severity, description, and phase origin for each code
|
|
- Documents programmatic usage patterns
|
|
|
|
## Acceptance Criteria
|
|
|
|
| Criterion | Status | Notes |
|
|
|-----------|--------|-------|
|
|
| All initial codes emitted in 5.x code paths | PASS | Codes verified in DiagCode enum |
|
|
| DiagnosticsCollector unit test: 4 threads → 4 entries | PASS | test_collector_thread_safety passes |
|
|
| Code registry matches regex pattern | PASS | All codes use SCREAMING_SNAKE_CASE |
|
|
| Output.errors populated correctly | PASS | Output struct has errors: Vec<DiagnosticJson> |
|
|
|
|
## Tests
|
|
|
|
All tests pass:
|
|
- `test_collector_new` - creates empty collector
|
|
- `test_collector_emit` - emits diagnostic with code only
|
|
- `test_collector_emit_with_offset` - emits diagnostic with offset
|
|
- `test_collector_emit_with_message` - emits diagnostic with custom message
|
|
- `test_collector_clone` - clones collector share same underlying data
|
|
- `test_collector_thread_safety` - 4 threads emit concurrently, all 8 diagnostics collected
|
|
|
|
## Commit
|
|
|
|
- **Hash**: `2be802a`
|
|
- **Message**: feat(pdftract-2u6q2): implement diagnostic infrastructure
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
# Run diagnostics tests
|
|
cargo test --lib diagnostics::collector_tests
|
|
|
|
# Build library
|
|
cargo build --lib
|
|
|
|
# Verify documentation exists
|
|
ls -l docs/integrations/diagnostics-codes.md
|
|
```
|