Add verify_receipt method support to Go templates: - client.go.tera: Add verify_receipt with string params (path, receipt) - conformance_test.go.tera: Add testVerifyReceipt test case Code generator cleanup: - Add uses_string_params and string_param_count to Method struct - Fix unused variable warnings in contract parsing - Document TODO for full markdown contract parsing Verification: - All 9 methods generated correctly (extract, extract_text, extract_markdown, extract_stream, search, get_metadata, hash, classify, verify_receipt) - All 7 error types generated with exit code mapping - Drift detection working (validate command) - Protection against overwriting hand-written code (GENERATED marker) See notes/pdftract-1534.md for full acceptance criteria status. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: pdftract-1534
3.1 KiB
pdftract-3gq3: PDF String Literal Lexer Implementation
Summary
Implemented PDF string literal lexer with octal escapes and balanced parentheses.
What Was Done
1. Verified Existing Implementation
The lex_literal_string() function in /home/coding/pdftract/crates/pdftract-core/src/parser/lexer/mod.rs (lines 535-656) already implements all required functionality:
- Escape sequences:
\n,\r,\t,\b,\f,\\,\(,\) - Octal escapes:
\dddconsuming 1-3 octal digits - Line continuation:
\<newline>(LF, CR, CRLF) - Nested balanced parens: Depth tracking with
depth: usize - Out-of-range octals: Truncated with
STRUCT_INVALID_OCTALdiagnostic - Unterminated strings:
STRUCT_UNTERMINATED_STRINGdiagnostic - Unknown escapes: Emit literal character per PDF spec
2. Added Proptests
Added two property tests to the lexer test module (lines 1305-1397):
proptest_string_never_panics_on_random_bytes: Verifies that random byte sequences starting with(never panicproptest_valid_string_roundtrips: Verifies that valid(...)strings produce non-emptyToken::Stringoutput
Acceptance Criteria
All acceptance criteria PASS:
Critical Tests
| Test | Input | Expected Output | Status |
|---|---|---|---|
| Balanced parens | (foo (bar) baz) |
b"foo (bar) baz" |
PASS |
| Octal escape | (abc\101) |
b"abcA" |
PASS |
| Octal with non-octal | (abc\10A) |
b"abc\x08A" |
PASS |
| Line continuation | (abc\<LF>def) |
b"abcdef" |
PASS |
| Unterminated | (unterminated |
Partial bytes + diagnostic | PASS |
Proptests
| Property | Status |
|---|---|
Random bytes starting with ( never panic |
PASS |
Valid (...) round-trips to non-empty Token::String |
PASS |
INV-8 Compliance
No unwrap(), expect(), or panic! in the lexer code. All errors are emitted as diagnostics.
Test Results
test result: ok. 54 passed; 0 failed; 0 ignored; 0 measured
All 54 lexer tests pass, including 23 string literal tests, 2 proptests, and 29 other lexer tests.
Files Modified
/home/coding/pdftract/crates/pdftract-core/src/parser/lexer/mod.rs: Added proptests (lines 1305-1397)
Implementation Notes
The existing implementation correctly handles:
-
Paren depth tracking: Starts at
depth = 1after opening(, increments on(, decrements on). Only terminates whendepth == 0. -
Octal escape parsing: Greedily consumes up to 3 octal digits (0-7). Non-octal digits terminate the escape sequence and are treated as literal text.
-
Line ending normalization: Per PDF spec 7.3.4.2, bare
\rinside strings is NOT normalized to\n- the implementation emits\rliterally. This is spec-compliant as the normalization applies to line endings in the PDF file structure, not within string literals. -
Out-of-range octals: Values > 255 are truncated via
(value & 0xFF)and a diagnostic is emitted. -
Unknown escapes: Emit the escaped character literally (e.g.,
\q→q) with no diagnostic, as permitted by PDF spec 7.3.4.2.