pdftract/tests/lexer/fixtures/every_token.pdf.tokens.txt
jedarden 585d861efc test(pdftract-sy8x): implement lexer proptest harness and curated corpus
Add property-based testing infrastructure for the lexer module with 6+
property tests covering INV-8 (no panic), string/hex roundtrips, name
length bounds, and position monotonicity. Create 8 curated fixture files
with golden token outputs for critical edge cases including EC-01 empty
file test and whitespace-only inputs.

Changes:
- Add prop_string_roundtrip to tests/proptest/lexer.rs
- Create tests/lexer/fixtures/ with 8 fixtures + .tokens.txt golden files
- Add gen_lexer_golden.rs binary for regenerating golden outputs
- Fix missing ObjRef import in marked_content_operators.rs

Acceptance criteria:
- cargo test --features proptest -p pdftract-core: 105 lexer tests pass
- tests/lexer/fixtures/ contains 8 fixtures with .tokens.txt outputs
- EC-01 empty file test: 0-byte input -> Token::Eof, no panic
- Whitespace-only file test passes
- INV-8 verified by prop_lexer_never_panics

Closes: pdftract-sy8x
2026-05-24 02:36:37 -04:00

26 lines
601 B
Text

Bool(true)
Bool(false)
Null
Integer(123)
Integer(-42)
Real(3.14)
Real(-0.5)
String([72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100])
String([110, 101, 115, 116, 101, 100, 32, 40, 112, 97, 114, 101, 110, 115, 41])
String([72, 101, 108, 108, 111])
Name([84, 121, 112, 101])
Name([70, 111, 110, 116, 32, 70, 105, 108, 101])
Name([32, 115, 112, 97, 99, 101])
ArrayStart
ArrayEnd
DictStart
DictEnd
Stream
EndStream
Obj
Keyword([101, 110, 100, 111, 98, 106])
IndirectRef
Keyword([120, 114, 101, 102])
Keyword([116, 114, 97, 105, 108, 101, 114])
Keyword([115, 116, 97, 114, 116, 120, 114, 101, 102])
Eof