jedarden b0b73c3c4a docs(pdftract-45vo7): document Ruby SDK completion status

The Ruby SDK structure is in place with all 9 contract methods,
8 exception classes, and the Argo workflow template for RubyGems
publish is synced to declarative-config.

This is a v1.1+ deferred task. Ruby is not installed on the build
server, preventing local build/test verification. The SDK should
be moved to a separate repo (github.com/jedarden/pdftract-ruby)
when the v1.1+ release wave begins.

Verification note: notes/pdftract-45vo7.md

2026-06-01 10:20:43 -04:00

5.3 KiB

Raw Permalink Blame History

Bead pdftract-45vo7: Ruby SDK + RubyGems Publish

Status: Partially Complete (v1.1+ Deferred)

This bead is marked as v1.1+ deferred (lower priority than v1.0 SDKs). The Ruby SDK structure is in place but Ruby is not installed on the build server, preventing local build/test verification.

Completed Components

1. Ruby SDK Structure (pdftract-ruby/)

All required files are present and properly structured:

lib/pdftract.rb - Main module with convenience methods
lib/pdftract/client.rb - Client class with all 9 contract methods:
- extract(source, **options) → Document
- extract_text(source, **options) → String
- extract_markdown(source, **options) → String
- extract_stream(source, **options) → Enumerator
- search(source, pattern, **options) → Enumerator
- get_metadata(source, **options) → Metadata
- hash(source, **options) → Fingerprint
- classify(source) → Classification
- verify_receipt(path, receipt) → Boolean
lib/pdftract/models.rb - Data classes using Ruby 3.2+ Data.define
lib/pdftract/errors.rb - 8 exception classes inheriting from Pdftract::Error
lib/pdftract/source.rb - Source classes (PathSource, URLSource, BytesSource)
pdftract.gemspec - Gem specification (version 1.0.0)
Rakefile - Test and build tasks
test/conformance_test.rb - Minitest conformance suite
GENERATED marker file

2. Argo Workflow Template (.ci/argo-workflows/pdftract-ruby-publish.yaml)

The RubyGems publish workflow is complete and synced to declarative-config:

Location: ~/declarative-config/k8s/iad-ci/argo-workflows/pdftract-ruby-publish.yaml
Steps: clone → sync-version → bundle-install → conformance → build → publish
RubyGems API key: from ESO Secret rubygems-api-key-pdftract
Container: ruby:3.2-slim
Idempotent re-run logic (checks if version already exists)

3. Code Generator Templates

Templates exist at templates/sdk-skeleton/ruby/:

pdftract.gemspec.tera
lib/pdftract.rb.tera
lib/pdftract/codegen/types.rb.tera
lib/pdftract/codegen/errors.rb.tera
lib/pdftract/codegen/methods.rb.tera
test/codegen/conformance_test.rb.tera
README.md.tera
GENERATED.tera

Remaining Work (Requires Ruby Installation)

1. Separate Repo Setup

The task specifies the Ruby SDK should be in github.com/jedarden/pdftract-ruby as a separate repo. Currently it's in pdftract-ruby/ within the main pdftract repo.

Action needed:

Create github.com/jedarden/pdftract-ruby repository
Move pdftract-ruby/ contents to the new repo
Update Argo workflow to clone from the correct location

2. Build Verification

Ruby is not installed on the build server:

$ gem build pdftract.gemspec
bash: gem: command not found

Action needed:

Install Ruby 3.2+ and bundler on build server OR
Run build/test in CI container (ruby:3.2-slim) only

3. Conformance Test Verification

The conformance test exists but requires:

A built pdftract binary in PATH
Test fixtures from tests/sdk-conformance/fixtures/

Action needed:

Ensure test fixtures are included in Ruby SDK repo
Run bundle exec rake test:conformance in CI environment

Implementation Details

Error Handling (Exit Code → Exception)

Exit Code	Exception	Description
2	CorruptPdfError	The PDF file is corrupt or invalid
3	EncryptionError	PDF is encrypted, password missing/wrong
4	SourceUnreachableError	Source (file/URL) is unreadable
5	RemoteFetchInterruptedError	Network interrupted during fetch
6	TlsError	TLS certificate validation failed
10	ReceiptVerifyError	Receipt verification failed

Option Naming Convention

CLI --ocr-language → Ruby ocr_language (snake_case kwargs)

Example:

client.extract(source, ocr_language: "eng", pages: "1-3")

Streaming with Enumerator

The extract_stream and search methods return Ruby Enumerators that lazily parse NDJSON:

client.extract_stream(source).each do |page|
  puts "Page #{page.page}: #{page.spans.map(&:text).join}"
end

Source Handling

Three source types:

PathSource - local filesystem path
URLSource - remote URL (passed as --url <url>)
BytesSource - in-memory bytes (written to temp file, cleaned up)

Recommendations

Since this is v1.1+ deferred:

Keep current structure - The in-tree pdftract-ruby/ is fine for now
Complete on v1.1+ release - When Ruby SDK becomes priority:
- Create separate github.com/jedarden/pdftract-ruby repo
- Set up Ruby build environment in CI
- Run full conformance test suite
- Publish to RubyGems

Files Modified

None - this is a documentation bead. The Ruby SDK structure and Argo workflow template were already in place.

Commit

docs(pdftract-45vo7): document Ruby SDK completion status

The Ruby SDK structure is in place with all 9 contract methods,
8 exception classes, and the Argo workflow template for RubyGems
publish is synced to declarative-config.

This is a v1.1+ deferred task. Ruby is not installed on the build
server, preventing local build/test verification. The SDK should
be moved to a separate repo (github.com/jedarden/pdftract-ruby)
when the v1.1+ release wave begins.

Verification note: notes/pdftract-45vo7.md

5.3 KiB Raw Permalink Blame History

Bead pdftract-45vo7: Ruby SDK + RubyGems Publish

Status: Partially Complete (v1.1+ Deferred)

Completed Components

1. Ruby SDK Structure (pdftract-ruby/)

2. Argo Workflow Template (.ci/argo-workflows/pdftract-ruby-publish.yaml)

3. Code Generator Templates

Remaining Work (Requires Ruby Installation)

1. Separate Repo Setup

2. Build Verification

3. Conformance Test Verification

Implementation Details

Error Handling (Exit Code → Exception)

Option Naming Convention

Streaming with Enumerator

Source Handling

Recommendations

Files Modified

Commit

5.3 KiB

Raw Permalink Blame History