Add documentation for the fix that removed diagnostic emission for
unknown keywords, complementing the earlier keyword fallback fix.
Co-Authored-By: Claude Code <noreply@anthropic.com>
Fixed incorrect fallback behavior in keyword lexer functions. Four
functions (lex_e_keyword, lex_o_keyword, lex_r_keyword, lex_n_keyword)
were incorrectly calling lex_name() instead of lex_keyword() when
keywords didn't match.
When a PDF contains an unrecognized word starting with e/o/n/R
(e.g., "endob" instead of "endobj"), the lexer should fall back to
generic keyword parsing (Token::Keyword(bytes)), not name parsing.
Names always start with /, so calling lex_name() on input without
a leading / would incorrectly skip the first byte.
References:
- Bead: pdftract-5upi
- Notes: notes/pdftract-5upi.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The structural token lexer was already fully implemented. All 84 lexer
tests pass, covering all acceptance criteria:
- Array/dict delimiters ([], <<>>)
- Keywords (true, false, null, obj, endobj, stream, endstream, R)
- Hex string vs dict ambiguity (< vs <<)
- Stream header validation (\n or \r\n only, lone \r is invalid)
- Case-sensitive keyword matching
This commit fixes a pre-existing compilation error in xref.rs where
forward_scan_memory() called parse_obj_header_at_memory() which didn't
exist. Added the missing function as a byte-slice variant of
parse_obj_header_at() for efficient memory-based scanning.
Verification: notes/pdftract-5upi.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>