418 lines
17 KiB
Markdown
418 lines
17 KiB
Markdown
# Claude Code Internals
|
||
|
||
## Session System
|
||
|
||
### Session Identity
|
||
|
||
Every Claude Code process creates a live session record at:
|
||
```
|
||
~/.claude/sessions/<pid>.json
|
||
```
|
||
|
||
Contents (observed from v2.1.168):
|
||
```json
|
||
{
|
||
"pid": 360946,
|
||
"sessionId": "37f84004-275c-46fd-8947-54348867302a",
|
||
"cwd": "/home/coding",
|
||
"startedAt": 1780834744503,
|
||
"procStart": "117958",
|
||
"version": "2.1.168",
|
||
"peerProtocol": 1,
|
||
"kind": "interactive",
|
||
"entrypoint": "cli",
|
||
"status": "idle",
|
||
"updatedAt": 1780836672456
|
||
}
|
||
```
|
||
|
||
`kind` is `interactive` for TUI sessions, `print` for `-p` runs. `entrypoint` is `cli` for both (the billing field is set separately in request headers). `status` transitions: `working` while a turn is in progress, `idle` when waiting for input.
|
||
|
||
The session file is deleted when the process exits. Enumerating `~/.claude/sessions/` gives all currently-running Claude Code processes.
|
||
|
||
### Transcript Storage
|
||
|
||
Each session writes its full conversation to:
|
||
```
|
||
~/.claude/projects/<cwd-slug>/<session-id>.jsonl
|
||
```
|
||
|
||
The `<cwd-slug>` is the working directory path with `/` replaced by `-` (e.g., `/home/coding/claude-print` → `-home-coding-claude-print`).
|
||
|
||
The JSONL file is **append-only** — every event is a single JSON line. The file is flushed incrementally during a session; at Stop-hook fire time there is a race window (2–5 ms) where the final assistant event may not yet be written.
|
||
|
||
To derive `transcript_path` from a session record:
|
||
```python
|
||
import os, re
|
||
|
||
def transcript_path(session_id, cwd):
|
||
slug = cwd.replace('/', '-')
|
||
return os.path.expanduser(f'~/.claude/projects/{slug}/{session_id}.jsonl')
|
||
```
|
||
|
||
### Session Flags
|
||
|
||
Relevant CLI flags for session management:
|
||
|
||
| Flag | Effect |
|
||
|------|--------|
|
||
| `--session-id <uuid>` | Assigns a specific UUID as this session's ID; the session writes to the corresponding JSONL path |
|
||
| `-r/--resume <id>` | Resumes a prior session: reads existing JSONL as conversation history, continues appending |
|
||
| `-c/--continue` | Resume the most recent session in the current working directory |
|
||
| `--fork-session` | Used with `--resume`/`--continue`; generates a new session ID instead of reusing the original — branches the conversation |
|
||
| `-n/--name <name>` | Display name shown in TUI and `/resume` picker |
|
||
| `--no-session-persistence` | Disables JSONL writing entirely; nothing is persisted |
|
||
|
||
**Resuming a subprocess session into the parent:**
|
||
```bash
|
||
# Start a subprocess session with a known ID
|
||
claude --session-id a1b2c3d4-... --dangerously-skip-permissions --print "do work"
|
||
|
||
# Later, resume that session in interactive mode — full history is available
|
||
claude --resume a1b2c3d4-...
|
||
```
|
||
|
||
This is the cleanest way to "add an independent session to the main session": resume the subprocess session in the parent terminal. The history is already in the JSONL; `--resume` replays it as context.
|
||
|
||
### `--fork-session` Branching
|
||
|
||
`--resume <id> --fork-session` creates a new session ID and writes to a new JSONL file, but reads the prior session's JSONL as its initial conversation history. The prior session is unchanged. This is useful for branching from a completed subprocess session without modifying it.
|
||
|
||
## JSONL Transcript Format
|
||
|
||
### Event Types
|
||
|
||
| Type | When written | Notes |
|
||
|------|-------------|-------|
|
||
| `file-history-snapshot` | Session start | File tracking for undo |
|
||
| `user` | Each user turn | Includes `entrypoint`, `cwd`, `sessionId`, `version`, `gitBranch` |
|
||
| `assistant` | Each API response chunk | One event per streaming chunk; all chunks for one turn share identical `message.usage` |
|
||
| `system` | Tool results, local commands | `subtype: "local_command"` for slash commands; `subtype: "tool_result"` etc. |
|
||
| `last-prompt` | After each turn | Records `lastPrompt`, `leafUuid`, `sessionId` |
|
||
| `attachment` | File/image attachments | |
|
||
| `result` | `--print` mode only | Final result object (see below) |
|
||
| `summary` | After compaction | Compressed context summary |
|
||
|
||
### `user` Event Fields
|
||
|
||
```json
|
||
{
|
||
"parentUuid": "...", // UUID of the event this follows in the message tree
|
||
"isSidechain": false,
|
||
"promptId": "...", // groups all events for a single user turn
|
||
"type": "user",
|
||
"message": {
|
||
"role": "user",
|
||
"content": "<prompt text>" // may be string or array of content blocks
|
||
},
|
||
"uuid": "...",
|
||
"timestamp": "2026-06-07T...",
|
||
"userType": "external",
|
||
"entrypoint": "cli", // "cli" for TUI, "sdk-cli" for -p after June 15
|
||
"cwd": "/home/coding/...",
|
||
"sessionId": "...",
|
||
"version": "2.1.168",
|
||
"gitBranch": "main"
|
||
}
|
||
```
|
||
|
||
Tool results appear as `user` events with `message.content` being an array of `{"type": "tool_result", "tool_use_id": "...", "content": [...]}` blocks.
|
||
|
||
### `assistant` Event — Streaming Chunks
|
||
|
||
A single API call (one LLM turn) produces **multiple consecutive `assistant` events** — one per streaming chunk. All chunks for the same API call carry **identical `message.usage`** objects.
|
||
|
||
This means: to identify unique API turns, detect when `message.usage` changes between consecutive `assistant` events.
|
||
|
||
```json
|
||
{
|
||
"parentUuid": "...",
|
||
"isSidechain": false,
|
||
"type": "assistant",
|
||
"message": {
|
||
"role": "assistant",
|
||
"content": [
|
||
{"type": "thinking", "thinking": "..."},
|
||
{"type": "text", "text": "..."},
|
||
{"type": "tool_use", "id": "toolu_...", "name": "Bash", "input": {"command": "..."}}
|
||
],
|
||
"model": "claude-sonnet-4-6",
|
||
"usage": {
|
||
"input_tokens": 6178,
|
||
"output_tokens": 295,
|
||
"cache_creation_input_tokens": 825,
|
||
"cache_read_input_tokens": 26442,
|
||
"server_tool_use": {"web_search_requests": 0, "web_fetch_requests": 0},
|
||
"service_tier": "standard",
|
||
"cache_creation": {
|
||
"ephemeral_5m_input_tokens": 0,
|
||
"ephemeral_1h_input_tokens": 825
|
||
},
|
||
"inference_geo": "",
|
||
"iterations": [
|
||
{
|
||
"input_tokens": 6178,
|
||
"output_tokens": 295,
|
||
"cache_read_input_tokens": 26442,
|
||
"cache_creation_input_tokens": 825,
|
||
"cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 825},
|
||
"type": "message"
|
||
}
|
||
],
|
||
"speed": "standard"
|
||
}
|
||
},
|
||
"uuid": "...",
|
||
"timestamp": "..."
|
||
}
|
||
```
|
||
|
||
Content block types within a single turn:
|
||
- `"type": "thinking"` — extended thinking scratchpad (not part of final response)
|
||
- `"type": "text"` — assistant prose (the human-visible response)
|
||
- `"type": "tool_use"` — a tool call (name + input object)
|
||
|
||
One turn often splits across several chunks: e.g., chunk 1 has a `thinking` block, chunk 2 has the first `text` block, chunk 3 has a `tool_use` block.
|
||
|
||
### `last-prompt` Event
|
||
|
||
```json
|
||
{
|
||
"type": "last-prompt",
|
||
"lastPrompt": "full text of the last user prompt...",
|
||
"leafUuid": "...",
|
||
"sessionId": "..."
|
||
}
|
||
```
|
||
|
||
Written after every assistant turn (points to the most recent user message). Useful for finding session boundaries without scanning all events.
|
||
|
||
### `result` Event (`--print` mode)
|
||
|
||
In `--print` mode, the final event in the JSONL is a `result` object:
|
||
```json
|
||
{
|
||
"type": "result",
|
||
"subtype": "success",
|
||
"is_error": false,
|
||
"result": "final response text",
|
||
"session_id": "...",
|
||
"num_turns": 3,
|
||
"duration_ms": 12400,
|
||
"cost_usd": 0,
|
||
"usage": {
|
||
"input_tokens": 1240,
|
||
"output_tokens": 380,
|
||
"cache_creation_input_tokens": 0,
|
||
"cache_read_input_tokens": 900
|
||
}
|
||
}
|
||
```
|
||
|
||
**This event is absent from interactive-mode transcripts.** In interactive mode, token totals must be computed by aggregating across unique API turns in the `assistant` events.
|
||
|
||
## Token Counting
|
||
|
||
### Problem: Streaming Duplicates
|
||
|
||
Every streaming chunk event for the same API call carries the same `usage` object. Naively summing `output_tokens` across all `assistant` events over-counts by the number of chunks per turn.
|
||
|
||
### Correct Approach: Dedup by Usage Fingerprint
|
||
|
||
Two consecutive `assistant` events belong to the same API call if and only if their `message.usage` objects are identical (same `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`). Detect turn boundaries when any of these values changes:
|
||
|
||
```python
|
||
def extract_turns(jsonl_path):
|
||
turns = []
|
||
prev_usage_key = None
|
||
|
||
with open(jsonl_path) as f:
|
||
for line in f:
|
||
obj = json.loads(line)
|
||
if obj.get('type') != 'assistant':
|
||
continue
|
||
usage = obj.get('message', {}).get('usage', {})
|
||
key = (
|
||
usage.get('input_tokens'),
|
||
usage.get('output_tokens'),
|
||
usage.get('cache_creation_input_tokens'),
|
||
usage.get('cache_read_input_tokens'),
|
||
)
|
||
if key != prev_usage_key:
|
||
turns.append(usage)
|
||
prev_usage_key = key
|
||
|
||
return turns
|
||
|
||
def sum_tokens(turns):
|
||
return {
|
||
'input_tokens': sum(t.get('input_tokens', 0) for t in turns),
|
||
'output_tokens': sum(t.get('output_tokens', 0) for t in turns),
|
||
'cache_creation_input_tokens': sum(t.get('cache_creation_input_tokens', 0) for t in turns),
|
||
'cache_read_input_tokens': sum(t.get('cache_read_input_tokens', 0) for t in turns),
|
||
}
|
||
```
|
||
|
||
Observed behavior on a 176-line transcript: 45 unique API turns (not 65 assistant events).
|
||
|
||
### `iterations` Array
|
||
|
||
Each `usage` object also has an `iterations` array with one entry per API sub-call within the turn (used for extended thinking or multi-step internal reasoning). For standard turns, `len(iterations) == 1`. Sum `iterations[i].output_tokens` if you need granular per-sub-call data.
|
||
|
||
### Extracting the Final Response Text
|
||
|
||
The final assistant message's text is the concatenation of all `"type": "text"` blocks from the last unique API turn:
|
||
|
||
```python
|
||
def extract_final_text(jsonl_path):
|
||
last_text_blocks = []
|
||
prev_usage_key = None
|
||
|
||
with open(jsonl_path) as f:
|
||
lines = f.readlines()
|
||
|
||
for line in lines:
|
||
obj = json.loads(line)
|
||
if obj.get('type') != 'assistant':
|
||
continue
|
||
msg = obj.get('message', {})
|
||
usage = msg.get('usage', {})
|
||
key = (usage.get('input_tokens'), usage.get('output_tokens'),
|
||
usage.get('cache_creation_input_tokens'), usage.get('cache_read_input_tokens'))
|
||
if key != prev_usage_key:
|
||
last_text_blocks = []
|
||
prev_usage_key = key
|
||
for block in msg.get('content', []):
|
||
if block.get('type') == 'text':
|
||
last_text_blocks.append(block['text'])
|
||
|
||
return ''.join(last_text_blocks)
|
||
```
|
||
|
||
Skip `thinking` and `tool_use` blocks — they are not part of the human-visible response.
|
||
|
||
### Race Condition: Stop Hook Fires Before JSONL Flush
|
||
|
||
The Stop hook fires approximately 2–5 ms before Claude Code flushes the final `assistant` event to the JSONL. If the transcript is read immediately on Stop:
|
||
- The final API turn may be missing from the JSONL
|
||
- Or the last chunk may be partially written (truncated JSON line)
|
||
|
||
**Retry strategy:**
|
||
```python
|
||
import time
|
||
|
||
def read_with_retry(jsonl_path, max_retries=40, interval=0.05):
|
||
for attempt in range(max_retries):
|
||
text = extract_final_text(jsonl_path)
|
||
if text:
|
||
return text
|
||
time.sleep(interval)
|
||
return None # use Stop hook payload fallback
|
||
```
|
||
|
||
40 × 50 ms = 2 s maximum wait. Observed: text available within 1–3 retries (50–150 ms after Stop fires).
|
||
|
||
## Hook System
|
||
|
||
### Available Hook Events
|
||
|
||
From `~/.claude/settings.json` (observed on v2.1.168):
|
||
|
||
| Hook event | When it fires | Stdin payload |
|
||
|------------|---------------|---------------|
|
||
| `SessionStart` | Claude Code process starts | `{session_id, cwd, ...}` |
|
||
| `SessionEnd` | Process exits | `{session_id, ...}` |
|
||
| `Stop` | Assistant finishes a turn, waiting for next input | `{session_id, transcript_path, last_assistant_message, ...}` |
|
||
| `UserPromptSubmit` | User submits a new message | `{session_id, prompt, ...}` |
|
||
| `PreToolUse` | Before each tool call | `{session_id, tool_name, tool_input, ...}` |
|
||
| `PermissionRequest` | Before granting a permission | `{session_id, permission, ...}` |
|
||
|
||
### Stop Hook Payload
|
||
|
||
```json
|
||
{
|
||
"hook_event_name": "Stop",
|
||
"session_id": "37f84004-275c-46fd-8947-54348867302a",
|
||
"transcript_path": "/home/coding/.claude/projects/-home-coding-claude-print/37f84004-....jsonl",
|
||
"last_assistant_message": "The final text of the last assistant turn",
|
||
"cwd": "/home/coding/claude-print"
|
||
}
|
||
```
|
||
|
||
`last_assistant_message` is the extracted text of the final turn — available directly without reading the JSONL. Useful as a fallback when the JSONL isn't flushed yet and the retry loop is exhausted.
|
||
|
||
### Hook Configuration
|
||
|
||
Hooks are configured in `~/.claude/settings.json` (user-global), `.claude/settings.json` (project), or `.claude/settings.local.json` (local override). The `--settings <path>` flag specifies an additional settings file. Settings are merged; all matching hooks fire.
|
||
|
||
Per-run settings overlay (the `claude-print` approach):
|
||
```json
|
||
{
|
||
"hooks": {
|
||
"Stop": [{
|
||
"hooks": [{"type": "command", "command": "/tmp/claude-print-PID/hook.sh", "timeout": 10}]
|
||
}]
|
||
}
|
||
}
|
||
```
|
||
|
||
The hook script receives the JSON payload on stdin. Exit code is ignored by Claude Code (hooks are fire-and-forget). Timeout (seconds) aborts the hook process if it runs too long.
|
||
|
||
### Existing Hooks on This Server
|
||
|
||
The following hooks are active in `~/.claude/settings.json` and will fire for all claude sessions including subprocess ones:
|
||
- `PermissionRequest` → `trail-boss/trailboss-emit.sh`
|
||
- `PreToolUse` → `~/.ccdash/hooks/pre-tool-use.sh`
|
||
- `SessionEnd` → `~/.ccdash/hooks/session-end.sh` + `trailboss-emit.sh`
|
||
- `SessionStart` → `~/.ccdash/hooks/session-start.sh` + `trailboss-emit.sh`
|
||
- `Stop` → `~/.ccdash/hooks/stop.sh`
|
||
- `UserPromptSubmit` → (ccdash hook)
|
||
|
||
`trailboss-emit.sh` silently exits 0 if `$TMUX_PANE` is not set — subprocess sessions are unaffected. `ccdash` hooks update the session registry, which is correct behavior.
|
||
|
||
## Retrieving Output from an Independent Session
|
||
|
||
### Method 1: Stop Hook + JSONL (Primary)
|
||
|
||
The subprocess session fires the Stop hook when done. `claude-print` pre-installs an additional per-run hook via `--settings` overlay. The hook writes the Stop payload to a named FIFO. The parent reads the FIFO, gets `transcript_path` and `last_assistant_message`, then reads the JSONL for full text and token counts.
|
||
|
||
This is the most reliable method. Latency: Stop hook fires within 50–200 ms of the final token being generated.
|
||
|
||
### Method 2: Session-ID Pre-assignment (`--session-id`)
|
||
|
||
Assign a known UUID to the subprocess session at spawn time:
|
||
```python
|
||
import uuid
|
||
child_session_id = str(uuid.uuid4())
|
||
transcript_path = f'~/.claude/projects/{cwd_slug}/{child_session_id}.jsonl'
|
||
|
||
args = ['claude', '--session-id', child_session_id, '--dangerously-skip-permissions', ...]
|
||
```
|
||
|
||
The parent knows the JSONL path before the session starts. Can poll the file directly without waiting for a Stop hook payload. Combine with the Stop FIFO for reliable completion signaling.
|
||
|
||
### Method 3: Resume (`--resume`) — Adding to the Main Session
|
||
|
||
After a subprocess session completes, its full history (user prompts + assistant responses) is in its JSONL. The main session (or any subsequent session) can incorporate it:
|
||
|
||
```bash
|
||
# Branch from the subprocess session's history
|
||
claude --resume <child-session-id> --fork-session
|
||
```
|
||
|
||
This creates a new session that has the subprocess session's entire conversation as its history. The user (or next automated prompt) continues from that point.
|
||
|
||
Alternatively: the calling session can read the subprocess session's final response and inject it as context in the next user turn. This avoids merging session histories but achieves the same goal.
|
||
|
||
### Method 4: Structured Output Re-injection
|
||
|
||
`claude-print` emits a structured result object (`--output-format json`). The caller (e.g., NEEDLE) treats this as the final response. The caller's own session (the NEEDLE worker session) receives the result as a tool output. The subprocess session's token usage is reported in the structured result and can be forwarded to any accounting system.
|
||
|
||
This is how `claude-print` integrates with NEEDLE: NEEDLE's session sees the result as if it were a tool call output; the actual LLM work happened in the subprocess session billed separately.
|
||
|
||
## Billing Classification
|
||
|
||
The `entrypoint` field is set in `user` events in the JSONL. Observed values: `"cli"` for interactive TUI sessions. The billing classification (`cc_entrypoint` header sent to the API) is determined by the process mode at startup — if `claude` has a real TTY (checked via `isatty()`), it enters TUI mode and uses `cli`. If stdout is a pipe, it uses `sdk-cli`.
|
||
|
||
Running under `claude-print`'s PTY: `isatty(slave_fd)` returns `true` → TUI mode → `cli` billing.
|
||
Running as `claude -p`: `isatty(stdout)` returns `false` → print mode → `sdk-cli` billing.
|