jedarden/pdftract

Fork 0

Commit graph

Author	SHA1	Message	Date
jedarden	0b838de6cc	docs(pdftract-5upi): update verification note with additional bug fix Add documentation for the fix that removed diagnostic emission for unknown keywords, complementing the earlier keyword fallback fix. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-20 22:05:17 -04:00
jedarden	fee6ed8afd	fix(pdftract-5upi): correct keyword fallback in lexer Fixed incorrect fallback behavior in keyword lexer functions. Four functions (lex_e_keyword, lex_o_keyword, lex_r_keyword, lex_n_keyword) were incorrectly calling lex_name() instead of lex_keyword() when keywords didn't match. When a PDF contains an unrecognized word starting with e/o/n/R (e.g., "endob" instead of "endobj"), the lexer should fall back to generic keyword parsing (Token::Keyword(bytes)), not name parsing. Names always start with /, so calling lex_name() on input without a leading / would incorrectly skip the first byte. References: - Bead: pdftract-5upi - Notes: notes/pdftract-5upi.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:55:55 -04:00
jedarden	a88353069a	fix(pdftract-5upi): add parse_obj_header_at_memory for xref forward scan The structural token lexer was already fully implemented. All 84 lexer tests pass, covering all acceptance criteria: - Array/dict delimiters ([], <<>>) - Keywords (true, false, null, obj, endobj, stream, endstream, R) - Hex string vs dict ambiguity (< vs <<) - Stream header validation (\n or \r\n only, lone \r is invalid) - Case-sensitive keyword matching This commit fixes a pre-existing compilation error in xref.rs where forward_scan_memory() called parse_obj_header_at_memory() which didn't exist. Added the missing function as a byte-slice variant of parse_obj_header_at() for efficient memory-based scanning. Verification: notes/pdftract-5upi.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 02:54:35 -04:00

Author

SHA1

Message

Date

jedarden

0b838de6cc

docs(pdftract-5upi): update verification note with additional bug fix

Add documentation for the fix that removed diagnostic emission for
unknown keywords, complementing the earlier keyword fallback fix.

Co-Authored-By: Claude Code <noreply@anthropic.com>

2026-05-20 22:05:17 -04:00

jedarden

fee6ed8afd

fix(pdftract-5upi): correct keyword fallback in lexer

Fixed incorrect fallback behavior in keyword lexer functions. Four
functions (lex_e_keyword, lex_o_keyword, lex_r_keyword, lex_n_keyword)
were incorrectly calling lex_name() instead of lex_keyword() when
keywords didn't match.

When a PDF contains an unrecognized word starting with e/o/n/R
(e.g., "endob" instead of "endobj"), the lexer should fall back to
generic keyword parsing (Token::Keyword(bytes)), not name parsing.
Names always start with /, so calling lex_name() on input without
a leading / would incorrectly skip the first byte.

References:
- Bead: pdftract-5upi
- Notes: notes/pdftract-5upi.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-20 21:55:55 -04:00

jedarden

a88353069a

fix(pdftract-5upi): add parse_obj_header_at_memory for xref forward scan

The structural token lexer was already fully implemented. All 84 lexer
tests pass, covering all acceptance criteria:

- Array/dict delimiters ([], <<>>)
- Keywords (true, false, null, obj, endobj, stream, endstream, R)
- Hex string vs dict ambiguity (< vs <<)
- Stream header validation (\n or \r\n only, lone \r is invalid)
- Case-sensitive keyword matching

This commit fixes a pre-existing compilation error in xref.rs where
forward_scan_memory() called parse_obj_header_at_memory() which didn't
exist. Added the missing function as a byte-slice variant of
parse_obj_header_at() for efficient memory-based scanning.

Verification: notes/pdftract-5upi.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 02:54:35 -04:00

3 commits