pdftract/pdftract-node/README.md
jedarden 0932cf1fdc feat(sdks): vendor dotnet/java/node SDKs into the monorepo
Consolidate the .NET, Java, and Node SDKs into root-level pdftract-<lang>/
directories (matching the already-tracked pdftract-go/), per the decision to
make the generated SDKs first-class monorepo members rather than separate repos.
Content imported from the standalone ~/pdftract-<lang> repos (build artifacts
excluded). Removes the broken empty-git nested clones that were polluting the
working tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:20:19 -04:00

1.5 KiB

@pdftract/sdk

Node.js SDK for pdftract - PDF extraction and conformance testing.

Installation

npm install @pdftract/sdk@1.0.0

Usage

Basic extract

import { Client, path } from '@pdftract/sdk';

const client = new Client();
const doc = await client.extract(path('document.pdf'));
console.log(`Pages: ${doc.pages.length}`);

Extract with OCR

import { Client, path } from '@pdftract/sdk';

const client = new Client();
const doc = await client.extract(path('scanned.pdf'), {
  ocrLanguage: 'eng',
  ocrThreshold: 0.7
});
import { Client, path } from '@pdftract/sdk';

const client = new Client();
for await (const match of client.search(path('document.pdf'), 'invoice')) {
  console.log(`Found on page ${match.page}: ${match.text}`);
}

Stream extraction

import { Client, path } from '@pdftract/sdk';

const client = new Client();
for await (const page of client.extractStream(path('large.pdf'))) {
  console.log(`Page ${page.page}: ${page.blocks.length} blocks`);
}

Binary version compatibility

This SDK requires pdftract 1.0.0. Download from: https://github.com/jedarden/pdftract/releases/tag/v1.0.0

Troubleshooting

Binary not found

Ensure pdftract is on your PATH. The SDK probes PATH for the executable.

Version mismatch

The SDK will refuse to invoke mismatched binary versions. Install the correct version.

Network failure

For remote URLs, check your network connection and TLS certificate chain.