pdftract/templates/sdk-skeleton/go/README.md.tera
jedarden e176fa68ad fix(pdftract-2hm4): fix hex string lexer invalid char handling and whitespace/comment skipping
Two fixes:

1. Hex string lexer now flushes dangling nibble when encountering invalid
   characters. For `<4X8Y>`, the X and Y are invalid, so we flush nibble 4
   as 0x40, then flush nibble 8 as 0x80, producing `\x40\x80`.

2. Fixed skip_whitespace_and_comments() to properly handle whitespace
   after comments. The previous logic only continued looping if the next
   byte was `%`, missing cases where whitespace follows a comment.

All 52 lexer tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 01:47:17 -04:00

68 lines
1.3 KiB
Text

# pdftract-go
Go SDK for pdftract - PDF extraction and conformance testing.
## Installation
```bash
go get github.com/jedarden/pdftract-go@{{ version }}
```
## Usage
### Basic extract
```go
package main
import (
"fmt"
"github.com/jedarden/pdftract-go"
)
func main() {
client := pdftract.NewClient()
doc, err := client.Extract("document.pdf", nil)
if err != nil {
panic(err)
}
fmt.Printf("Pages: %d\n", len(doc.Pages))
}
```
### Extract with OCR
```go
options := &pdftract.ExtractOptions{
OCRLanguage: "eng",
OCRThreshold: 0.7,
}
doc, err := client.Extract("scanned.pdf", options)
```
### Search
```go
matches, err := client.Search("document.pdf", "invoice", &pdftract.SearchOptions{
CaseInsensitive: true,
})
for match := range matches {
fmt.Printf("Found on page %d: %s\n", match.Page, match.Text)
}
```
## Binary version compatibility
This SDK requires pdftract {{ version }}. Download from:
https://github.com/jedarden/pdftract/releases/tag/v{{ version }}
## Troubleshooting
### Binary not found
Ensure `pdftract` is on your PATH. The SDK probes PATH for the executable.
### Version mismatch
The SDK will refuse to invoke mismatched binary versions. Install the correct version.
### Network failure
For remote URLs, check your network connection and TLS certificate chain.