jedarden 0ab3b42e13 Add sandbox isolation: CLAUDE_CONFIG_DIR, transcript forwarding, isolation tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-07 10:49:00 -04:00

34 KiB

Raw Blame History

claude-print Plan

Overview

Single Rust binary that is a drop-in replacement for claude -p. It drives the Claude Code interactive TUI via PTY, extracts the response via the Stop hook and JSONL transcript, and emits claude -p-compatible output — all while billing against the subscription (cc_entrypoint=cli) rather than the Agent SDK credit pool.

Background

Starting June 15, 2026, Anthropic separates claude -p (headless) into a separate monthly credit pool. Only the interactive TUI (cc_entrypoint=cli) continues drawing from the unlimited subscription. claude-print wraps the TUI in a PTY so callers get claude -p wire-compatible output while billing against the subscription.

The billing classification is determined by isatty(stdout) inside the claude binary at startup:

PTY slave as stdout → isatty() returns true → TUI mode → cc_entrypoint=cli → subscription
Pipe as stdout → isatty() returns false → print mode → cc_entrypoint=sdk-cli → credit pool

Delivery

Single statically-linked binary. No Python, no runtime dependencies, no pip packages.

claude-print          # the binary
install.sh            # copies binary to ~/.local/bin/, installs NEEDLE agent config

Built with:

cargo build --release --target x86_64-unknown-linux-musl   # fully static, no libc dep

Distribution: GitHub Release artifact via claude-print-ci Argo WorkflowTemplate (same pattern as NEEDLE, SIGIL, ARMOR).

Architecture

caller
  │  prompt (stdin, arg, or --input-file)
  ▼
claude-print (single Rust binary)
  ├── CLI parser       flags forwarded to claude subprocess (clap)
  ├── Hook installer   per-run temp dir: settings.json + hook.sh + stop.fifo
  ├── PTY spawner      nix::pty::openpty() + fork() + login_tty()
  ├── Event loop       poll() on master_fd; dispatches to:
  │     ├── Terminal emu   responds to DA1/DA2/DSR/XTVERSION/window-size probes
  │     ├── Startup seq    phase 1: trust dismiss  phase 2: bracketed-paste inject
  │     └── FIFO poller    blocks on stop.fifo until Stop hook fires
  ├── Transcript rdr   JSONL parse → final text + token counts (retry loop)
  ├── Emitter          text / json / stream-json to stdout
  └── Cleanup          FIFO, temp dir, master_fd, waitpid

Sandbox Isolation

The inner claude process must not:

Register itself in the live session registry (~/.claude/sessions/) where ccdash and trail-boss can see it
Fire the user's global hooks (ccdash session tracking, trail-boss telemetry emitter) on Start/Stop/PermissionRequest
Pollute ~/.claude/history.jsonl with headless prompts

But its output (transcript JSONL + token counts) must be forwarded to ~/.claude/projects/ so the normal stats pipeline can aggregate usage.

Mechanism: `CLAUDE_CONFIG_DIR`

Confirmed present in the Claude Code binary. When set, Claude Code uses that directory instead of ~/.claude for all file I/O:

CLAUDE_CONFIG_DIR → sessions/, projects/, history.jsonl, settings.json, stats-cache.json, etc.

claude-print sets CLAUDE_CONFIG_DIR to a subdirectory inside its per-run temp dir before execvp:

$TMPDIR/claude-print-<pid>-<rand>/      ← tempfile::TempDir root
├── claude-home/                         ← CLAUDE_CONFIG_DIR value
│   ├── .credentials.json → ~/.claude/.credentials.json  (symlink)
│   ├── settings.json                    ← Stop hook only
│   ├── sessions/                        ← subprocess session files (isolated)
│   └── projects/
│       └── <cwd-slug>/
│           └── <session-id>.jsonl       ← subprocess transcript
├── hook.sh
└── stop.fifo

The credentials symlink gives the child access to OAuth auth without copying secrets into the temp dir.

What the Inner Process Writes (Sandbox)

File	Written by child	Disposition after session
`sessions/<pid>.json`	Yes	discarded (in temp dir, cleaned up)
`projects/<slug>/<id>.jsonl`	Yes	copied to `~/.claude/projects/<slug>/<id>.jsonl`
`history.jsonl`	Yes	discarded (headless prompts not in interactive history)
`stats-cache.json`	Yes	discarded (rebuilt from projects/)

Transcript Forwarding

After the Stop hook fires and the transcript is read:

Ensure ~/.claude/projects/<cwd-slug>/ exists (create if absent)
Copy $CLAUDE_CONFIG_DIR/projects/<cwd-slug>/<session-id>.jsonl to ~/.claude/projects/<cwd-slug>/<session-id>.jsonl
The stats cache rebuilds naturally on next interactive Claude Code startup — the transcript appears as a normal past session

This makes claude-print sessions visible in /status usage stats, preserves the billing audit trail, and lets the user see past prompts via /resume <session-id>.

Hooks Not Inherited

CLAUDE_CONFIG_DIR/settings.json contains only the per-run Stop hook. The user's ~/.claude/settings.json is not read. Therefore:

ccdash session tracking does not fire
trail-boss does not receive these session events
No PermissionRequest hook fires (the REPL trust dialog is dismissed via PTY instead)

Crate Dependencies

Crate	Purpose
`clap` (derive)	CLI argument parsing
`nix`	`openpty`, `fork`, `login_tty`, `setsid`, `ioctl`, `poll`, `mkfifo`, `signal`
`serde` + `serde_json`	JSONL parsing with schema-tolerant deserialization
`uuid`	Generate session IDs (for `--session-id` pre-assignment)
`tempfile`	Per-run temp directory with guaranteed cleanup

No async runtime. The PTY event loop uses nix::poll::poll() synchronously. stream-json output uses a separate thread tailing the transcript file.

Components

1. CLI Interface

Drop-in for claude -p:

Flag	Description
`prompt` (positional)	Prompt string; mutually exclusive with `--input-file` and stdin
`--input-file FILE`	Read prompt from file
`--model MODEL`	Forwarded to claude (default: `claude-sonnet-4-6`)
`--max-turns N`	Forwarded to claude (default: 30)
`--output-format FORMAT`	`text` (default), `json`, `stream-json`
`--allowedTools LIST`	Comma-separated, forwarded
`--disallowedTools LIST`	Forwarded
`--dangerously-skip-permissions`	Forwarded
`--timeout SECS`	Wall-clock timeout (default: 3600)
`--claude-binary PATH`	Override claude binary path (default: resolves `claude` from PATH)
`--version`	Print `claude-print <version> (wrapping claude <version>)` and exit
`--verbose`	Write timing traces to stderr

Stdin accepted as prompt when not a TTY and no positional/--input-file given.

Exit codes:

0 — success
1 — assistant error (is_error: true in transcript)
2 — internal error (PTY spawn, hook setup, parse failure)
124 — timeout exceeded
130 — interrupted (SIGINT)

2. Hook Installer / Sandbox Builder

Creates $TMPDIR/claude-print-<pid>-<rand>/ via tempfile::Builder with this layout:

<temp>/
├── claude-home/                     ← CLAUDE_CONFIG_DIR (set in child env)
│   ├── .credentials.json            ← symlink → ~/.claude/.credentials.json
│   └── settings.json                ← Stop hook only (no user hooks)
├── hook.sh                          ← executed by Claude Code on Stop
└── stop.fifo                        ← POSIX named pipe for hook→parent IPC

claude-home/settings.json — the only settings file the child reads:

{
  "hooks": {
    "Stop": [{
      "hooks": [{"type": "command", "command": "<temp>/hook.sh", "timeout": 10}]
    }]
  }
}

hook.sh (executed by Claude Code on Stop; receives payload on stdin):

#!/bin/sh
cat > <temp>/stop.fifo

stop.fifo — POSIX named pipe created with nix::unistd::mkfifo().

Child process environment additions:

CLAUDE_CONFIG_DIR=<temp>/claude-home

CLAUDE_CONFIG_DIR is set in the child's env via the fork/exec path — it is not set in the parent process. This ensures the parent's own Claude Code session (if any) is unaffected.

tempfile::TempDir handles cleanup on any drop path (panic, early return, or normal exit). Transcript copying (see Sandbox Isolation §) runs before the temp dir is dropped.

The user's ~/.claude/settings.json is never touched.

3. PTY Spawner

use nix::pty::{openpty, OpenptyResult};
use nix::unistd::{fork, ForkResult, login_tty};

let OpenptyResult { master, slave } = openpty(None, None)?;

// Set window size on master before fork
set_winsize(master, rows, cols);

match unsafe { fork()? } {
    ForkResult::Child => {
        drop(master);
        login_tty(slave)?;   // setsid + TIOCSCTTY + dup2(slave, 0/1/2)
        execvp("claude", &args)?;
        unreachable!()
    }
    ForkResult::Parent { child } => {
        drop(slave);
        run_event_loop(master, child, ...)
    }
}

login_tty(slave) is glibc's login_tty(3): setsid() → TIOCSCTTY → dup2(slave, 0/1/2) → close(slave).

Window size read from /dev/tty via TIOCGWINSZ; falls back to 220 × 50.

Cleanup on any exit path: SIGTERM → 2 s → SIGKILL → waitpid.

4. Event Loop

Single poll() call on three fds:

master_fd   POLLIN → read PTY output, dispatch to TerminalEmu + StartupSeq
stop_fifo   POLLIN → Stop hook fired; read payload, begin transcript extraction
timer       —      → check wall-clock timeout

TerminalEmu runs on every chunk of PTY output, scanning for escape sequences and queueing responses. Responses written to master_fd on the next writable poll.

StartupSeq tracks phase (Waiting / TrustDismiss / PromptInjected) and transitions based on heuristics (see §5).

FifoPoller opens stop.fifo for reading in a non-blocking O_NONBLOCK open; polls for data via the same poll() call.

5. Terminal Emulator (Ink probe responder)

Ink sends DEC terminal queries at startup and hangs if unanswered. The emulator scans raw bytes for known probe patterns:

Probe bytes	Response bytes	Notes
`ESC [ c` or `ESC [ 0 c`	`ESC [ ? 6 c`	DA1
`ESC [ > c` or `ESC [ > 0 c`	`ESC [ > 0 ; 0 ; 0 c`	DA2
`ESC [ 6 n`	`ESC [ 1 ; 1 R`	DSR cursor position
`ESC [ > q`	`ESC P > \| claude-print ESC \`	XTVERSION (DCS string)
`ESC [ 1 8 t`	`ESC [ 8 ; <rows> ; <cols> t`	Window size

Version-resilience rule: Unknown escape sequences (ESC [ ... <letter> not in the table above) are silently discarded — never treated as an error. If Ink adds new probe types in future versions, they are ignored and the session proceeds via the startup sequencer timeout.

Each probe type is acknowledged at most once per session (dedup bitmask).

6. Startup Sequencer

Phase 1 — Trust/welcome dismiss:

The trust dialog asks the user to confirm before allowing tool use. Detection uses keyword scanning, not exact string match, to survive UI text changes across Claude Code versions:

If any output line contains two or more of: trust, Allow, continue, folder, permission, proceed → send \r immediately
Fallback: after 0.8 s with no new PTY bytes and ≥ 200 bytes received total → send \r (covers any welcome/confirmation prompt)
Hard timeout 45 s with zero bytes → exit 2 (binary not found or hung)

Phase 2 — Prompt injection:

After Phase 1 CR, wait until PTY is idle for 2.0 s (REPL re-renders)
Send via bracketed paste: \x1b[200~<prompt>\x1b[201~\r
Bracketed paste treats embedded \n as literals (no premature Enter)
Prompts > 32 KB: write to $TMPDIR/claude-print-.../prompt.txt; send /read <path>\r

7. Stop Poller

Reads from stop.fifo (non-blocking open; polled via the main poll() loop). On data available:

Read one line → parse JSON with lenient schema (all fields Option<T>)
Extract session_id and transcript_path (either direct or derived from session_id + cwd)
Signal the event loop to exit
Send \x1b[201~\r/exit\r to PTY child to trigger graceful shutdown

If Stop never fires within --timeout seconds: emit timeout result, SIGTERM child, exit 124.

8. Transcript Reader

On Stop receipt:

1. Open transcript_path (derived if not in payload)
2. Scan for unique API turns (usage-fingerprint dedup)
3. Collect final turn's text blocks
4. Sum token counts across all unique turns
5. Retry loop if final_text is empty (race window): 40 × 50 ms
6. Fallback to last_assistant_message from Stop payload if retries exhausted
7. If both empty: is_error=true, exit 1

Token aggregation (usage dedup):

Multiple consecutive assistant events share identical message.usage objects (streaming chunks). Count a new turn only when (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens) changes:

let mut prev_key: Option<UsageKey> = None;
let mut turns: Vec<Usage> = vec![];
for event in parse_events(path) {
    if let Event::Assistant { message } = event {
        let key = UsageKey::from(&message.usage);
        if Some(&key) != prev_key.as_ref() {
            turns.push(message.usage.clone());
            prev_key = Some(key);
        }
        // accumulate text blocks from current chunk
    }
}

Schema tolerance (serde config for all JSONL structs):

#[derive(Deserialize, Default)]
#[serde(default)]          // missing fields → Default::default()
pub struct Usage {
    pub input_tokens:                Option<u64>,
    pub output_tokens:               Option<u64>,
    pub cache_creation_input_tokens: Option<u64>,
    pub cache_read_input_tokens:     Option<u64>,
    // Unknown fields are silently ignored (no deny_unknown_fields)
}

#[derive(Deserialize)]
#[serde(tag = "type", rename_all = "kebab-case")]
pub enum Event {
    Assistant { message: AssistantMessage },
    User { message: UserMessage },
    Result(ResultEvent),
    #[serde(other)]         // any unknown type → skip, no error
    Unknown,
}

#[derive(Deserialize)]
#[serde(tag = "type", rename_all = "kebab-case")]
pub enum ContentBlock {
    Text { text: String },
    ToolUse { name: String },
    Thinking { thinking: String },
    #[serde(other)]
    Unknown,
}

8b. Transcript Forwarding

After extraction completes (regardless of success or failure):

let src = sandbox_claude_home
    .join("projects")
    .join(&cwd_slug)
    .join(format!("{}.jsonl", session_id));
let dst_dir = real_claude_dir.join("projects").join(&cwd_slug);
std::fs::create_dir_all(&dst_dir)?;
let dst = dst_dir.join(format!("{}.jsonl", session_id));
std::fs::copy(&src, &dst)?;

real_claude_dir is $HOME/.claude (not CLAUDE_CONFIG_DIR, which is the sandbox). The copy runs before the TempDir is dropped.

After the copy, the session appears in ~/.claude/projects/ exactly like any other Claude Code session. It is visible in /status usage stats and resumable via claude --resume <session-id>.

If the copy fails (disk full, permissions): log a warning to stderr but do not change the exit code. Response extraction already succeeded; forwarding is best-effort.

9. Emitter

text (default): {response_text}\n

json:

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "result": "<response text>",
  "session_id": "<uuid>",
  "num_turns": 3,
  "duration_ms": 4200,
  "cost_usd": 0,
  "claude_version": "2.1.168",
  "usage": {
    "input_tokens": 6224,
    "output_tokens": 43079,
    "cache_creation_input_tokens": 107205,
    "cache_read_input_tokens": 4066110
  }
}

stream-json: Spawns a reader thread that tails the transcript JSONL from prompt_injected_at timestamp, forwarding each new raw event line to stdout as it is written by Claude Code. After Stop fires, drains remaining lines. Output is raw JSONL (one JSON object per line), compatible with claude -p --output-format stream-json.

claude_version field (new, not in claude -p wire format): included in all output formats for version-change debugging. Callers that parse strictly by field name are unaffected by the extra field.

Error result:

{"type": "result", "subtype": "timeout|interrupted|internal_error|assistant_error",
 "is_error": true, "error_message": "..."}

10. NEEDLE Agent Config

claude-print.yaml → ~/.needle/agents/:

name: claude-print
description: Claude Code interactive mode — subscription billing (cc_entrypoint=cli)
agent_cli: claude-print
version_command: "claude-print --version"
input_method:
  method: stdin
invoke_template: "cd {workspace} && claude-print --model {model} --max-turns 30 --dangerously-skip-permissions"
timeout_secs: 3600
provider: anthropic
model: claude-sonnet-4-6
output_transform: needle-transform-claude
cost:
  type: use_or_lose

11. Install Script

install.sh:

Detect arch (uname -m) and select binary from release assets
Verify claude is on $PATH
Install binary to ~/.local/bin/claude-print (mode 755)
Install claude-print.yaml to ~/.needle/agents/ (mode 644, skipped if NEEDLE not installed)
Run claude-print --version to confirm
Print detected claude version for version-compat record

Data Models

Stop Hook Payload (received from Claude Code — all fields optional)

{
  "hook_event_name": "Stop",
  "session_id": "abc123",
  "transcript_path": "/home/coding/.claude/projects/.../abc123.jsonl",
  "last_assistant_message": "...",
  "cwd": "/home/coding/..."
}

transcript_path absent → derive from session_id + cwd. last_assistant_message absent → retry loop only (no string fallback).

JSONL Transcript — Full Usage Object (as observed v2.1.168)

{
  "input_tokens": 6178,
  "output_tokens": 295,
  "cache_creation_input_tokens": 825,
  "cache_read_input_tokens": 26442,
  "server_tool_use": {"web_search_requests": 0, "web_fetch_requests": 0},
  "service_tier": "standard",
  "cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 825},
  "inference_geo": "",
  "iterations": [{"input_tokens": 6178, "output_tokens": 295, ...}],
  "speed": "standard"
}

Only input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens are aggregated. All other fields ignored.

Emitted Result (--output-format json)

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "result": "response text",
  "session_id": "abc123",
  "num_turns": 1,
  "duration_ms": 4200,
  "cost_usd": 0,
  "claude_version": "2.1.168",
  "usage": {
    "input_tokens": 1240,
    "output_tokens": 380,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 900
  }
}

Error Handling

Condition	Detection	Action	Exit
`claude` binary not found	PATH lookup fails at startup	emit error	2
Credentials file missing	symlink target absent	emit error	2
PTY open fails	`openpty()` returns Err	emit error	2
Sandbox build fails	temp dir / mkfifo / symlink error	emit error	2
Transcript copy fails	I/O error on forwarding	warning to stderr, continue	—
No PTY output within 45 s	startup timer	kill child, emit error	2
Child exits before Stop	`waitpid` returns	emit error with child exit code	2
Wall-clock timeout	poll timer	SIGTERM child, emit timeout	124
Stop hook never fires	FIFO timeout	SIGTERM child, emit timeout	124
SIGINT	signal handler	SIGTERM child, emit interrupt result	130
Transcript empty + fallback empty	retry exhausted	emit error	1
`is_error: true` in transcript	result event or error block	emit error result	1
Rate limit / API error	error content in transcript	emit error result	1

Implementation Phases

Phase 1: Crate scaffold — Cargo.toml with pinned deps, src/main.rs with CLI parsing (clap), --version output including detected claude --version
Phase 2: Sandbox builder + PTY spawner — temp dir, CLAUDE_CONFIG_DIR subdirectory, credentials symlink, sandboxed settings.json, hook.sh, mkfifo, then nix fork/exec with CLAUDE_CONFIG_DIR in child env, window-size probe, login_tty, SIGTERM/SIGKILL cleanup, waitpid
Phase 3: Event loop — poll() on master_fd + FIFO fd + timeout; read buffer; EIO detection
Phase 4: Terminal emulator — probe scanner, response table, dedup bitmask; unknown-probe passthrough
Phase 5: Startup sequencer — keyword-based trust dismiss, idle-gap timing, bracketed paste injection, large-prompt file relay
Phase 6: Hook installer — tempfile::TempDir, write settings.json and hook.sh, mkfifo, FIFO polling
Phase 7: Transcript reader — JSONL parse with lenient serde, usage dedup, text extraction, retry loop, Stop-payload fallback, path derivation
Phase 8: Emitter — text/json/stream-json formats, claude_version field, error result objects, exit code mapping
Phase 9: NEEDLE integration — claude-print.yaml, install.sh, claude-print-ci WorkflowTemplate in declarative-config
Phase 10: Tests — unit + mock PTY + version-resilience (see Testing section)
Phase 11: CI — claude-print-ci Argo WorkflowTemplate: fmt + clippy + test + release binary

Testing

Unit Tests (`src/` inline + `tests/`)

Terminal probe responder (tests/terminal.rs):

DA1 bytes in → ESC[?6c response bytes out
DA2 bytes in → ESC[>0;0;0c out
DSR bytes in → ESC[1;1R out
XTVERSION bytes in → correct DCS string out
Window-size query → ESC[8;50;220t with actual configured dimensions
Multiple probes in one chunk → all answered in order
Probe dedup: send DA1 twice → response emitted only once
Unknown escape sequence (ESC[99t) → ignored, no response, no panic
Partial probe at chunk boundary (probe split across two reads) → matched and answered on second read

JSONL parser (tests/transcript.rs):

Single assistant turn, single text block → correct text
Multi-block content: text + tool_use + thinking + text → text blocks concatenated, others skipped
Multi-turn: 3 unique usage keys → 3 unique turns, last turn's text returned
Streaming duplicate dedup: 5 consecutive events with identical usage → counted as 1 turn
Token aggregation: 45 unique turns → correct sum across all 4 token fields
Missing cache_creation_input_tokens in usage → defaults to 0, no panic
input_tokens: null in usage → treated as 0
Unknown event type ("type": "new-future-event") → silently skipped, parse continues
Unknown content block type ("type": "image") → silently skipped, text blocks still extracted
Unknown fields in usage object → silently ignored, known fields still parsed
Malformed JSONL line (truncated JSON) → line skipped, subsequent lines parsed
Empty file → returns empty text, zero token counts (no panic)

Stop hook parser (tests/hook.rs):

Full payload → all fields extracted
Missing transcript_path → fallback path derived from session_id + cwd
Missing last_assistant_message → None (retry-only fallback)
Unknown top-level fields in payload → silently ignored
Malformed JSON → Err, triggers exit 2

Emitter (tests/emitter.rs):

text: correct string, trailing newline, no extra whitespace
json: valid JSON, all required fields present, claude_version included
json: usage fields are integers not strings
stream-json: each line parses as independent JSON object
Error result: is_error: true, correct subtype string, non-zero exit
Zero token counts when fallback path taken: usage present with all-zero values

Startup sequencer (tests/startup.rs):

Trust keywords trust + Allow in same line → CR sent immediately
Trust keywords in different lines of same chunk → CR sent
Alternative wording continue + folder → CR sent (keyword union logic)
Arbitrary unknown welcome text (no keywords) → fallback: CR after 0.8 s idle
No output for 45 s → error returned
199 bytes received then idle 0.8 s → no CR yet (minimum 200 bytes enforced)
200 bytes received then idle 0.8 s → CR sent

CLI (tests/cli.rs):

Positional prompt → forwarded correctly
--input-file overrides stdin
Stdin used when not a TTY and no other prompt source
Conflicting prompt sources → error with clear message
--timeout 0 → error (must be positive)
--output-format invalid → error listing valid values
--claude-binary /custom/path → spawns that binary, not PATH lookup
--version output parses as "claude-print X.Y.Z (wrapping claude A.B.C)"

Mock PTY Integration Tests (`tests/integration/`)

A mock_claude binary (compiled as a test fixture, not a shell script) simulates Claude Code's startup behavior. Built in a separate Cargo workspace member test-fixtures/mock-claude/ so it compiles to a native binary with controlled behavior. Controlled via env vars:

Env var	Effect
`MOCK_TRUST_DIALOG=1`	Emit trust dialog text before REPL
`MOCK_TRUST_WORDING=alternate`	Use different trust wording (`Continue` instead of `Allow`)
`MOCK_OMIT_TRANSCRIPT_PATH=1`	Omit `transcript_path` from Stop payload
`MOCK_OMIT_LAST_MESSAGE=1`	Omit `last_assistant_message` from Stop payload
`MOCK_DELAY_JSONL=<ms>`	Write final JSONL event after N ms delay (race simulation)
`MOCK_UNKNOWN_PROBE=1`	Emit unknown ESC sequence before DA1
`MOCK_UNKNOWN_EVENT_TYPE=1`	Write unknown event type to transcript JSONL
`MOCK_UNKNOWN_USAGE_FIELDS=1`	Add extra fields to usage object
`MOCK_RESPONSE=<text>`	Response text to write into transcript
`MOCK_TURNS=<n>`	Number of assistant turns to simulate
`MOCK_EXIT_BEFORE_STOP=1`	Exit without firing Stop hook
`MOCK_DELAY_STOP=<ms>`	Fire Stop after delay
`MOCK_IS_ERROR=1`	Write `is_error: true` to transcript result event

Integration test scenarios:

Scenario	Mock config	Assertion
Happy path	defaults	exit 0, correct response text, non-zero token counts
Trust dialog (standard wording)	`TRUST_DIALOG=1`	exit 0
Trust dialog (alternate wording)	`TRUST_DIALOG=1 TRUST_WORDING=alternate`	exit 0 (resilience)
No startup output	emit nothing	exit 2 after timeout
Child exits before Stop	`EXIT_BEFORE_STOP=1`	exit 2
Stop hook never fires	`DELAY_STOP=99999`	exit 124
Transcript race	`DELAY_JSONL=100`	retry loop fires, exit 0
Missing `transcript_path`	`OMIT_TRANSCRIPT_PATH=1`	path derived, exit 0
Missing `last_assistant_message`	`OMIT_LAST_MESSAGE=1`	retry-only path, exit 0
Both omitted + delayed JSONL	`OMIT_LAST_MESSAGE=1 DELAY_JSONL=200`	retries suffice, exit 0
Error in transcript	`IS_ERROR=1`	exit 1, `is_error: true` in output
SIGINT	`DELAY_STOP=5000` + send SIGINT at 1 s	exit 130, child killed
Multi-turn	`TURNS=3`	last turn text returned, 3 turns in token sum
Large prompt (>32KB)	33000-byte prompt	file relay used, exit 0
Unknown probe emitted	`UNKNOWN_PROBE=1`	probe ignored, session completes
Unknown event type in JSONL	`UNKNOWN_EVENT_TYPE=1`	parse succeeds, text extracted
Unknown usage fields	`UNKNOWN_USAGE_FIELDS=1`	ignored, token counts correct
Output format json	defaults	output parses as valid JSON
Output format stream-json	defaults	each output line parses as valid JSON

Sandbox Isolation Tests (`tests/sandbox.rs`)

These tests verify that the inner claude process is contained and that transcripts are forwarded correctly to ~/.claude/projects/.

CLAUDE_CONFIG_DIR isolation:

Spawn mock_claude with a controlled CLAUDE_CONFIG_DIR; verify the child writes its session file inside that dir, not in ~/.claude/sessions/
Spawn with CLAUDE_CONFIG_DIR set; verify real ~/.claude/sessions/ contains no new entry after the run
Verify real ~/.claude/settings.json hooks (read the file before and after a mock run) are not modified

Credentials symlink:

Verify sandbox dir contains .credentials.json as a symlink pointing to real credentials file
Verify the symlink resolves to the real file (not a copy)
Run with credentials symlink absent: expect graceful error, not hang

Transcript forwarding:

After a successful mock run, verify ~/.claude/projects/<cwd-slug>/<session-id>.jsonl was created
Verify its contents match the sandbox transcript byte-for-byte
Verify the temp dir is cleaned up after the run (no leftover files in $TMPDIR)
Run with ~/.claude/projects/ unwritable: verify warning to stderr but exit 0 (forwarding is best-effort)

Hooks not inherited:

Write a test hook script to a temp file; point real ~/.claude/settings.json at it via CLAUDE_CONFIG_DIR trick inside the test; verify the test hook does NOT fire during a subprocess run (because the subprocess reads only its sandboxed settings.json)

--verbose sandbox trace:

With --verbose, verify stderr includes lines for: temp dir path, CLAUDE_CONFIG_DIR value, transcript copy src→dst

Version-Resilience Test Suite (`tests/version_compat.rs`)

A dedicated test module that verifies the binary survives schema changes across Claude Code versions. These tests are run in CI on every push and also on a weekly schedule.

Schema migration tests (property-based, using serde_json::Value to construct arbitrary payloads):

Stop payload with 50 unknown extra fields → parsed without error
Usage object with 20 new numeric fields → all ignored, 4 known fields correct
Content block with new required field → #[serde(other)] catches it as Unknown
JSONL with events in a new order (e.g., summary before user) → no assumption on ordering

claude --version compatibility tracker:

fn test_claude_version_recorded() {
    let output = Command::new("claude").arg("--version").output().unwrap();
    let version_str = String::from_utf8_lossy(&output.stdout);
    // Verify output is parseable (not checking the specific version)
    assert!(version_str.contains("Claude Code"), "unexpected claude --version format: {}", version_str);
    // Write to test artifact for CI diff tracking
    std::fs::write("target/last-claude-version.txt", version_str.as_bytes()).ok();
}

CI stores last-claude-version.txt as a build artifact. On the next run, if the version changed, a warning is printed and the full integration suite re-runs.

Startup heuristic stability test:

Generate 20 different trust dialog phrasings (varied keyword combinations)
For each: verify should_dismiss(line) returns true
Generate 10 non-dialog lines (ANSI art, progress bars, empty lines)
For each: verify should_dismiss(line) returns false

Token count regression test:

Fixture: tests/fixtures/transcript_v2.1.168.jsonl — a real captured transcript
Assert: token sum matches hardcoded expected values
When a new Claude version produces transcripts with a different schema, add a new fixture and assert on the new values. Both old and new fixtures must pass simultaneously (the parser handles both)

End-to-End Tests (credential-required, excluded from CI, run manually)

# Basic
echo "Say hello" | claude-print
claude-print --output-format json "What is 2+2?"
claude-print --output-format stream-json "List 5 animals"

# Tool use
claude-print --allowedTools Bash --dangerously-skip-permissions "Run: echo hello"

# Billing verification
# After running: check transcript entrypoint field
python3 -c "
import json, glob
for path in sorted(glob.glob('/home/coding/.claude/projects/**/*.jsonl', recursive=True))[-1:]:
    for line in open(path):
        obj = json.loads(line)
        if ep := obj.get('entrypoint'):
            print('entrypoint:', ep)
            break
"
# Expected: entrypoint: cli  (not sdk-cli)

# NEEDLE integration
needle run --agent claude-print --workspace /home/coding/some-project

Open Questions

--settings merge behavior: Does Claude Code merge multiple --settings files, or does the last one win? If merge, per-run hooks layer cleanly on user hooks. If last-wins, the user's hooks are shadowed. Needs verification; may require reading user settings and merging in-process rather than relying on Claude Code's merge.
Multiline prompt > 32 KB: Does the /read <path> slash command accept absolute paths? Does it block tool use (--allowedTools)? Needs end-to-end verification.
FIFO open race: hook.sh opens the FIFO for writing; the parent opens it for reading. Both sides block until the other end connects. The parent must open the read end before the Stop hook fires. If the Stop hook fires before the FIFO read end is open, the write blocks and eventually times out. Mitigation: open the read end before injecting the prompt (before Stop could fire). Verify timing.
musl vs glibc: openpty and login_tty are glibc extensions. Musl provides openpty in its PTY headers, but login_tty may not be available. May need to inline the login_tty implementation (setsid + TIOCSCTTY ioctl + dup2).
Credentials lookup with CLAUDE_CONFIG_DIR: Confirmed CLAUDE_CONFIG_DIR overrides all file I/O. The child reads .credentials.json from $CLAUDE_CONFIG_DIR/.credentials.json. Symlink to the real file is the right approach — it avoids copying secrets and stays current if the token is refreshed. Verify the child follows symlinks (it should; it uses normal file open).
Other CLAUDE_* env vars: The binary reads many env vars. Confirm none of them cause the child to bypass CLAUDE_CONFIG_DIR for session or history I/O. In particular, CLAUDE_CODE_SESSION_ID, CLAUDE_CODE_SESSION_KIND, and CLAUDE_JOB_DIR may need to be unset/overridden in the child env to avoid inheriting the parent session's identity.

34 KiB Raw Blame History Unescape Escape