claude-print/docs/research/claude-code-internals.md
jedarden 8ab946e1ef Add PTY mechanics and Claude Code internals research
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:41:46 -04:00

418 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Claude Code Internals
## Session System
### Session Identity
Every Claude Code process creates a live session record at:
```
~/.claude/sessions/<pid>.json
```
Contents (observed from v2.1.168):
```json
{
"pid": 360946,
"sessionId": "37f84004-275c-46fd-8947-54348867302a",
"cwd": "/home/coding",
"startedAt": 1780834744503,
"procStart": "117958",
"version": "2.1.168",
"peerProtocol": 1,
"kind": "interactive",
"entrypoint": "cli",
"status": "idle",
"updatedAt": 1780836672456
}
```
`kind` is `interactive` for TUI sessions, `print` for `-p` runs. `entrypoint` is `cli` for both (the billing field is set separately in request headers). `status` transitions: `working` while a turn is in progress, `idle` when waiting for input.
The session file is deleted when the process exits. Enumerating `~/.claude/sessions/` gives all currently-running Claude Code processes.
### Transcript Storage
Each session writes its full conversation to:
```
~/.claude/projects/<cwd-slug>/<session-id>.jsonl
```
The `<cwd-slug>` is the working directory path with `/` replaced by `-` (e.g., `/home/coding/claude-print``-home-coding-claude-print`).
The JSONL file is **append-only** — every event is a single JSON line. The file is flushed incrementally during a session; at Stop-hook fire time there is a race window (25 ms) where the final assistant event may not yet be written.
To derive `transcript_path` from a session record:
```python
import os, re
def transcript_path(session_id, cwd):
slug = cwd.replace('/', '-')
return os.path.expanduser(f'~/.claude/projects/{slug}/{session_id}.jsonl')
```
### Session Flags
Relevant CLI flags for session management:
| Flag | Effect |
|------|--------|
| `--session-id <uuid>` | Assigns a specific UUID as this session's ID; the session writes to the corresponding JSONL path |
| `-r/--resume <id>` | Resumes a prior session: reads existing JSONL as conversation history, continues appending |
| `-c/--continue` | Resume the most recent session in the current working directory |
| `--fork-session` | Used with `--resume`/`--continue`; generates a new session ID instead of reusing the original — branches the conversation |
| `-n/--name <name>` | Display name shown in TUI and `/resume` picker |
| `--no-session-persistence` | Disables JSONL writing entirely; nothing is persisted |
**Resuming a subprocess session into the parent:**
```bash
# Start a subprocess session with a known ID
claude --session-id a1b2c3d4-... --dangerously-skip-permissions --print "do work"
# Later, resume that session in interactive mode — full history is available
claude --resume a1b2c3d4-...
```
This is the cleanest way to "add an independent session to the main session": resume the subprocess session in the parent terminal. The history is already in the JSONL; `--resume` replays it as context.
### `--fork-session` Branching
`--resume <id> --fork-session` creates a new session ID and writes to a new JSONL file, but reads the prior session's JSONL as its initial conversation history. The prior session is unchanged. This is useful for branching from a completed subprocess session without modifying it.
## JSONL Transcript Format
### Event Types
| Type | When written | Notes |
|------|-------------|-------|
| `file-history-snapshot` | Session start | File tracking for undo |
| `user` | Each user turn | Includes `entrypoint`, `cwd`, `sessionId`, `version`, `gitBranch` |
| `assistant` | Each API response chunk | One event per streaming chunk; all chunks for one turn share identical `message.usage` |
| `system` | Tool results, local commands | `subtype: "local_command"` for slash commands; `subtype: "tool_result"` etc. |
| `last-prompt` | After each turn | Records `lastPrompt`, `leafUuid`, `sessionId` |
| `attachment` | File/image attachments | |
| `result` | `--print` mode only | Final result object (see below) |
| `summary` | After compaction | Compressed context summary |
### `user` Event Fields
```json
{
"parentUuid": "...", // UUID of the event this follows in the message tree
"isSidechain": false,
"promptId": "...", // groups all events for a single user turn
"type": "user",
"message": {
"role": "user",
"content": "<prompt text>" // may be string or array of content blocks
},
"uuid": "...",
"timestamp": "2026-06-07T...",
"userType": "external",
"entrypoint": "cli", // "cli" for TUI, "sdk-cli" for -p after June 15
"cwd": "/home/coding/...",
"sessionId": "...",
"version": "2.1.168",
"gitBranch": "main"
}
```
Tool results appear as `user` events with `message.content` being an array of `{"type": "tool_result", "tool_use_id": "...", "content": [...]}` blocks.
### `assistant` Event — Streaming Chunks
A single API call (one LLM turn) produces **multiple consecutive `assistant` events** — one per streaming chunk. All chunks for the same API call carry **identical `message.usage`** objects.
This means: to identify unique API turns, detect when `message.usage` changes between consecutive `assistant` events.
```json
{
"parentUuid": "...",
"isSidechain": false,
"type": "assistant",
"message": {
"role": "assistant",
"content": [
{"type": "thinking", "thinking": "..."},
{"type": "text", "text": "..."},
{"type": "tool_use", "id": "toolu_...", "name": "Bash", "input": {"command": "..."}}
],
"model": "claude-sonnet-4-6",
"usage": {
"input_tokens": 6178,
"output_tokens": 295,
"cache_creation_input_tokens": 825,
"cache_read_input_tokens": 26442,
"server_tool_use": {"web_search_requests": 0, "web_fetch_requests": 0},
"service_tier": "standard",
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 825
},
"inference_geo": "",
"iterations": [
{
"input_tokens": 6178,
"output_tokens": 295,
"cache_read_input_tokens": 26442,
"cache_creation_input_tokens": 825,
"cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 825},
"type": "message"
}
],
"speed": "standard"
}
},
"uuid": "...",
"timestamp": "..."
}
```
Content block types within a single turn:
- `"type": "thinking"` — extended thinking scratchpad (not part of final response)
- `"type": "text"` — assistant prose (the human-visible response)
- `"type": "tool_use"` — a tool call (name + input object)
One turn often splits across several chunks: e.g., chunk 1 has a `thinking` block, chunk 2 has the first `text` block, chunk 3 has a `tool_use` block.
### `last-prompt` Event
```json
{
"type": "last-prompt",
"lastPrompt": "full text of the last user prompt...",
"leafUuid": "...",
"sessionId": "..."
}
```
Written after every assistant turn (points to the most recent user message). Useful for finding session boundaries without scanning all events.
### `result` Event (`--print` mode)
In `--print` mode, the final event in the JSONL is a `result` object:
```json
{
"type": "result",
"subtype": "success",
"is_error": false,
"result": "final response text",
"session_id": "...",
"num_turns": 3,
"duration_ms": 12400,
"cost_usd": 0,
"usage": {
"input_tokens": 1240,
"output_tokens": 380,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 900
}
}
```
**This event is absent from interactive-mode transcripts.** In interactive mode, token totals must be computed by aggregating across unique API turns in the `assistant` events.
## Token Counting
### Problem: Streaming Duplicates
Every streaming chunk event for the same API call carries the same `usage` object. Naively summing `output_tokens` across all `assistant` events over-counts by the number of chunks per turn.
### Correct Approach: Dedup by Usage Fingerprint
Two consecutive `assistant` events belong to the same API call if and only if their `message.usage` objects are identical (same `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`). Detect turn boundaries when any of these values changes:
```python
def extract_turns(jsonl_path):
turns = []
prev_usage_key = None
with open(jsonl_path) as f:
for line in f:
obj = json.loads(line)
if obj.get('type') != 'assistant':
continue
usage = obj.get('message', {}).get('usage', {})
key = (
usage.get('input_tokens'),
usage.get('output_tokens'),
usage.get('cache_creation_input_tokens'),
usage.get('cache_read_input_tokens'),
)
if key != prev_usage_key:
turns.append(usage)
prev_usage_key = key
return turns
def sum_tokens(turns):
return {
'input_tokens': sum(t.get('input_tokens', 0) for t in turns),
'output_tokens': sum(t.get('output_tokens', 0) for t in turns),
'cache_creation_input_tokens': sum(t.get('cache_creation_input_tokens', 0) for t in turns),
'cache_read_input_tokens': sum(t.get('cache_read_input_tokens', 0) for t in turns),
}
```
Observed behavior on a 176-line transcript: 45 unique API turns (not 65 assistant events).
### `iterations` Array
Each `usage` object also has an `iterations` array with one entry per API sub-call within the turn (used for extended thinking or multi-step internal reasoning). For standard turns, `len(iterations) == 1`. Sum `iterations[i].output_tokens` if you need granular per-sub-call data.
### Extracting the Final Response Text
The final assistant message's text is the concatenation of all `"type": "text"` blocks from the last unique API turn:
```python
def extract_final_text(jsonl_path):
last_text_blocks = []
prev_usage_key = None
with open(jsonl_path) as f:
lines = f.readlines()
for line in lines:
obj = json.loads(line)
if obj.get('type') != 'assistant':
continue
msg = obj.get('message', {})
usage = msg.get('usage', {})
key = (usage.get('input_tokens'), usage.get('output_tokens'),
usage.get('cache_creation_input_tokens'), usage.get('cache_read_input_tokens'))
if key != prev_usage_key:
last_text_blocks = []
prev_usage_key = key
for block in msg.get('content', []):
if block.get('type') == 'text':
last_text_blocks.append(block['text'])
return ''.join(last_text_blocks)
```
Skip `thinking` and `tool_use` blocks — they are not part of the human-visible response.
### Race Condition: Stop Hook Fires Before JSONL Flush
The Stop hook fires approximately 25 ms before Claude Code flushes the final `assistant` event to the JSONL. If the transcript is read immediately on Stop:
- The final API turn may be missing from the JSONL
- Or the last chunk may be partially written (truncated JSON line)
**Retry strategy:**
```python
import time
def read_with_retry(jsonl_path, max_retries=40, interval=0.05):
for attempt in range(max_retries):
text = extract_final_text(jsonl_path)
if text:
return text
time.sleep(interval)
return None # use Stop hook payload fallback
```
40 × 50 ms = 2 s maximum wait. Observed: text available within 13 retries (50150 ms after Stop fires).
## Hook System
### Available Hook Events
From `~/.claude/settings.json` (observed on v2.1.168):
| Hook event | When it fires | Stdin payload |
|------------|---------------|---------------|
| `SessionStart` | Claude Code process starts | `{session_id, cwd, ...}` |
| `SessionEnd` | Process exits | `{session_id, ...}` |
| `Stop` | Assistant finishes a turn, waiting for next input | `{session_id, transcript_path, last_assistant_message, ...}` |
| `UserPromptSubmit` | User submits a new message | `{session_id, prompt, ...}` |
| `PreToolUse` | Before each tool call | `{session_id, tool_name, tool_input, ...}` |
| `PermissionRequest` | Before granting a permission | `{session_id, permission, ...}` |
### Stop Hook Payload
```json
{
"hook_event_name": "Stop",
"session_id": "37f84004-275c-46fd-8947-54348867302a",
"transcript_path": "/home/coding/.claude/projects/-home-coding-claude-print/37f84004-....jsonl",
"last_assistant_message": "The final text of the last assistant turn",
"cwd": "/home/coding/claude-print"
}
```
`last_assistant_message` is the extracted text of the final turn — available directly without reading the JSONL. Useful as a fallback when the JSONL isn't flushed yet and the retry loop is exhausted.
### Hook Configuration
Hooks are configured in `~/.claude/settings.json` (user-global), `.claude/settings.json` (project), or `.claude/settings.local.json` (local override). The `--settings <path>` flag specifies an additional settings file. Settings are merged; all matching hooks fire.
Per-run settings overlay (the `claude-print` approach):
```json
{
"hooks": {
"Stop": [{
"hooks": [{"type": "command", "command": "/tmp/claude-print-PID/hook.sh", "timeout": 10}]
}]
}
}
```
The hook script receives the JSON payload on stdin. Exit code is ignored by Claude Code (hooks are fire-and-forget). Timeout (seconds) aborts the hook process if it runs too long.
### Existing Hooks on This Server
The following hooks are active in `~/.claude/settings.json` and will fire for all claude sessions including subprocess ones:
- `PermissionRequest``trail-boss/trailboss-emit.sh`
- `PreToolUse``~/.ccdash/hooks/pre-tool-use.sh`
- `SessionEnd``~/.ccdash/hooks/session-end.sh` + `trailboss-emit.sh`
- `SessionStart``~/.ccdash/hooks/session-start.sh` + `trailboss-emit.sh`
- `Stop``~/.ccdash/hooks/stop.sh`
- `UserPromptSubmit` → (ccdash hook)
`trailboss-emit.sh` silently exits 0 if `$TMUX_PANE` is not set — subprocess sessions are unaffected. `ccdash` hooks update the session registry, which is correct behavior.
## Retrieving Output from an Independent Session
### Method 1: Stop Hook + JSONL (Primary)
The subprocess session fires the Stop hook when done. `claude-print` pre-installs an additional per-run hook via `--settings` overlay. The hook writes the Stop payload to a named FIFO. The parent reads the FIFO, gets `transcript_path` and `last_assistant_message`, then reads the JSONL for full text and token counts.
This is the most reliable method. Latency: Stop hook fires within 50200 ms of the final token being generated.
### Method 2: Session-ID Pre-assignment (`--session-id`)
Assign a known UUID to the subprocess session at spawn time:
```python
import uuid
child_session_id = str(uuid.uuid4())
transcript_path = f'~/.claude/projects/{cwd_slug}/{child_session_id}.jsonl'
args = ['claude', '--session-id', child_session_id, '--dangerously-skip-permissions', ...]
```
The parent knows the JSONL path before the session starts. Can poll the file directly without waiting for a Stop hook payload. Combine with the Stop FIFO for reliable completion signaling.
### Method 3: Resume (`--resume`) — Adding to the Main Session
After a subprocess session completes, its full history (user prompts + assistant responses) is in its JSONL. The main session (or any subsequent session) can incorporate it:
```bash
# Branch from the subprocess session's history
claude --resume <child-session-id> --fork-session
```
This creates a new session that has the subprocess session's entire conversation as its history. The user (or next automated prompt) continues from that point.
Alternatively: the calling session can read the subprocess session's final response and inject it as context in the next user turn. This avoids merging session histories but achieves the same goal.
### Method 4: Structured Output Re-injection
`claude-print` emits a structured result object (`--output-format json`). The caller (e.g., NEEDLE) treats this as the final response. The caller's own session (the NEEDLE worker session) receives the result as a tool output. The subprocess session's token usage is reported in the structured result and can be forwarded to any accounting system.
This is how `claude-print` integrates with NEEDLE: NEEDLE's session sees the result as if it were a tool call output; the actual LLM work happened in the subprocess session billed separately.
## Billing Classification
The `entrypoint` field is set in `user` events in the JSONL. Observed values: `"cli"` for interactive TUI sessions. The billing classification (`cc_entrypoint` header sent to the API) is determined by the process mode at startup — if `claude` has a real TTY (checked via `isatty()`), it enters TUI mode and uses `cli`. If stdout is a pipe, it uses `sdk-cli`.
Running under `claude-print`'s PTY: `isatty(slave_fd)` returns `true` → TUI mode → `cli` billing.
Running as `claude -p`: `isatty(stdout)` returns `false` → print mode → `sdk-cli` billing.