From 0b838de6cc3741ca6133141eba4b577fcd597df7 Mon Sep 17 00:00:00 2001 From: jedarden Date: Wed, 20 May 2026 22:05:17 -0400 Subject: [PATCH] docs(pdftract-5upi): update verification note with additional bug fix Add documentation for the fix that removed diagnostic emission for unknown keywords, complementing the earlier keyword fallback fix. Co-Authored-By: Claude Code --- notes/pdftract-5upi.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/notes/pdftract-5upi.md b/notes/pdftract-5upi.md index 6c697b7..b39e805 100644 --- a/notes/pdftract-5upi.md +++ b/notes/pdftract-5upi.md @@ -71,6 +71,21 @@ The structural token lexer was already fully implemented. This fix only correcte - All lexer branches handle EOF gracefully - Unknown keywords emit `Token::Keyword(bytes)` instead of panicking +## Additional Bug Fix (2026-05-20) + +### Commit: `7818f22` - `fix(pdftract-5upi): remove diagnostic emission for unknown keywords` + +**Issue**: The `lex_keyword()` function was incorrectly emitting `StructUnexpectedByte` diagnostics for unknown keywords. + +**Fix**: Removed diagnostic emission from `lex_keyword()` function (lines 540-564). + +**Rationale**: +1. Many valid keywords (trailer, xref, etc.) are not in the initial dispatch table +2. The object parser is responsible for validating keywords against known operators +3. Emitting diagnostics here causes false positives for valid PDF constructs + +This change aligns with the task requirement that unknown keywords emit `Token::Keyword` without a diagnostic, letting the object parser handle `STRUCT_UNKNOWN_KEYWORD` if needed. + ## Notes The lexer module compiles successfully. Full integration tests cannot run due to unrelated pre-existing compilation errors in other modules (missing `LZWDecoder`, `Diagnostic` type mismatches in catalog.rs, pages.rs, ocg.rs). These errors are not caused by this change.