jedarden 8ab946e1ef Add PTY mechanics and Claude Code internals research

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-07 10:41:46 -04:00

17 KiB

Raw Permalink Blame History

Claude Code Internals

Session System

Session Identity

Every Claude Code process creates a live session record at:

~/.claude/sessions/<pid>.json

Contents (observed from v2.1.168):

{
  "pid": 360946,
  "sessionId": "37f84004-275c-46fd-8947-54348867302a",
  "cwd": "/home/coding",
  "startedAt": 1780834744503,
  "procStart": "117958",
  "version": "2.1.168",
  "peerProtocol": 1,
  "kind": "interactive",
  "entrypoint": "cli",
  "status": "idle",
  "updatedAt": 1780836672456
}

kind is interactive for TUI sessions, print for -p runs. entrypoint is cli for both (the billing field is set separately in request headers). status transitions: working while a turn is in progress, idle when waiting for input.

The session file is deleted when the process exits. Enumerating ~/.claude/sessions/ gives all currently-running Claude Code processes.

Transcript Storage

Each session writes its full conversation to:

~/.claude/projects/<cwd-slug>/<session-id>.jsonl

The <cwd-slug> is the working directory path with / replaced by - (e.g., /home/coding/claude-print → -home-coding-claude-print).

The JSONL file is append-only — every event is a single JSON line. The file is flushed incrementally during a session; at Stop-hook fire time there is a race window (2–5 ms) where the final assistant event may not yet be written.

To derive transcript_path from a session record:

import os, re

def transcript_path(session_id, cwd):
    slug = cwd.replace('/', '-')
    return os.path.expanduser(f'~/.claude/projects/{slug}/{session_id}.jsonl')

Session Flags

Relevant CLI flags for session management:

Flag	Effect
`--session-id <uuid>`	Assigns a specific UUID as this session's ID; the session writes to the corresponding JSONL path
`-r/--resume <id>`	Resumes a prior session: reads existing JSONL as conversation history, continues appending
`-c/--continue`	Resume the most recent session in the current working directory
`--fork-session`	Used with `--resume`/`--continue`; generates a new session ID instead of reusing the original — branches the conversation
`-n/--name <name>`	Display name shown in TUI and `/resume` picker
`--no-session-persistence`	Disables JSONL writing entirely; nothing is persisted

Resuming a subprocess session into the parent:

# Start a subprocess session with a known ID
claude --session-id a1b2c3d4-... --dangerously-skip-permissions --print "do work"

# Later, resume that session in interactive mode — full history is available
claude --resume a1b2c3d4-...

This is the cleanest way to "add an independent session to the main session": resume the subprocess session in the parent terminal. The history is already in the JSONL; --resume replays it as context.

`--fork-session` Branching

--resume <id> --fork-session creates a new session ID and writes to a new JSONL file, but reads the prior session's JSONL as its initial conversation history. The prior session is unchanged. This is useful for branching from a completed subprocess session without modifying it.

JSONL Transcript Format

Event Types

Type	When written	Notes
`file-history-snapshot`	Session start	File tracking for undo
`user`	Each user turn	Includes `entrypoint`, `cwd`, `sessionId`, `version`, `gitBranch`
`assistant`	Each API response chunk	One event per streaming chunk; all chunks for one turn share identical `message.usage`
`system`	Tool results, local commands	`subtype: "local_command"` for slash commands; `subtype: "tool_result"` etc.
`last-prompt`	After each turn	Records `lastPrompt`, `leafUuid`, `sessionId`
`attachment`	File/image attachments
`result`	`--print` mode only	Final result object (see below)
`summary`	After compaction	Compressed context summary

`user` Event Fields

{
  "parentUuid": "...",          // UUID of the event this follows in the message tree
  "isSidechain": false,
  "promptId": "...",            // groups all events for a single user turn
  "type": "user",
  "message": {
    "role": "user",
    "content": "<prompt text>"  // may be string or array of content blocks
  },
  "uuid": "...",
  "timestamp": "2026-06-07T...",
  "userType": "external",
  "entrypoint": "cli",          // "cli" for TUI, "sdk-cli" for -p after June 15
  "cwd": "/home/coding/...",
  "sessionId": "...",
  "version": "2.1.168",
  "gitBranch": "main"
}

Tool results appear as user events with message.content being an array of {"type": "tool_result", "tool_use_id": "...", "content": [...]} blocks.

`assistant` Event — Streaming Chunks

A single API call (one LLM turn) produces multiple consecutive assistant events — one per streaming chunk. All chunks for the same API call carry identical message.usage objects.

This means: to identify unique API turns, detect when message.usage changes between consecutive assistant events.

{
  "parentUuid": "...",
  "isSidechain": false,
  "type": "assistant",
  "message": {
    "role": "assistant",
    "content": [
      {"type": "thinking", "thinking": "..."},
      {"type": "text", "text": "..."},
      {"type": "tool_use", "id": "toolu_...", "name": "Bash", "input": {"command": "..."}}
    ],
    "model": "claude-sonnet-4-6",
    "usage": {
      "input_tokens": 6178,
      "output_tokens": 295,
      "cache_creation_input_tokens": 825,
      "cache_read_input_tokens": 26442,
      "server_tool_use": {"web_search_requests": 0, "web_fetch_requests": 0},
      "service_tier": "standard",
      "cache_creation": {
        "ephemeral_5m_input_tokens": 0,
        "ephemeral_1h_input_tokens": 825
      },
      "inference_geo": "",
      "iterations": [
        {
          "input_tokens": 6178,
          "output_tokens": 295,
          "cache_read_input_tokens": 26442,
          "cache_creation_input_tokens": 825,
          "cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 825},
          "type": "message"
        }
      ],
      "speed": "standard"
    }
  },
  "uuid": "...",
  "timestamp": "..."
}

Content block types within a single turn:

"type": "thinking" — extended thinking scratchpad (not part of final response)
"type": "text" — assistant prose (the human-visible response)
"type": "tool_use" — a tool call (name + input object)

One turn often splits across several chunks: e.g., chunk 1 has a thinking block, chunk 2 has the first text block, chunk 3 has a tool_use block.

`last-prompt` Event

{
  "type": "last-prompt",
  "lastPrompt": "full text of the last user prompt...",
  "leafUuid": "...",
  "sessionId": "..."
}

Written after every assistant turn (points to the most recent user message). Useful for finding session boundaries without scanning all events.

`result` Event (`--print` mode)

In --print mode, the final event in the JSONL is a result object:

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "result": "final response text",
  "session_id": "...",
  "num_turns": 3,
  "duration_ms": 12400,
  "cost_usd": 0,
  "usage": {
    "input_tokens": 1240,
    "output_tokens": 380,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 900
  }
}

This event is absent from interactive-mode transcripts. In interactive mode, token totals must be computed by aggregating across unique API turns in the assistant events.

Token Counting

Problem: Streaming Duplicates

Every streaming chunk event for the same API call carries the same usage object. Naively summing output_tokens across all assistant events over-counts by the number of chunks per turn.

Correct Approach: Dedup by Usage Fingerprint

Two consecutive assistant events belong to the same API call if and only if their message.usage objects are identical (same input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens). Detect turn boundaries when any of these values changes:

def extract_turns(jsonl_path):
    turns = []
    prev_usage_key = None

    with open(jsonl_path) as f:
        for line in f:
            obj = json.loads(line)
            if obj.get('type') != 'assistant':
                continue
            usage = obj.get('message', {}).get('usage', {})
            key = (
                usage.get('input_tokens'),
                usage.get('output_tokens'),
                usage.get('cache_creation_input_tokens'),
                usage.get('cache_read_input_tokens'),
            )
            if key != prev_usage_key:
                turns.append(usage)
                prev_usage_key = key

    return turns

def sum_tokens(turns):
    return {
        'input_tokens':                 sum(t.get('input_tokens', 0)                 for t in turns),
        'output_tokens':                sum(t.get('output_tokens', 0)                for t in turns),
        'cache_creation_input_tokens':  sum(t.get('cache_creation_input_tokens', 0) for t in turns),
        'cache_read_input_tokens':      sum(t.get('cache_read_input_tokens', 0)      for t in turns),
    }

Observed behavior on a 176-line transcript: 45 unique API turns (not 65 assistant events).

`iterations` Array

Each usage object also has an iterations array with one entry per API sub-call within the turn (used for extended thinking or multi-step internal reasoning). For standard turns, len(iterations) == 1. Sum iterations[i].output_tokens if you need granular per-sub-call data.

Extracting the Final Response Text

The final assistant message's text is the concatenation of all "type": "text" blocks from the last unique API turn:

def extract_final_text(jsonl_path):
    last_text_blocks = []
    prev_usage_key = None

    with open(jsonl_path) as f:
        lines = f.readlines()

    for line in lines:
        obj = json.loads(line)
        if obj.get('type') != 'assistant':
            continue
        msg = obj.get('message', {})
        usage = msg.get('usage', {})
        key = (usage.get('input_tokens'), usage.get('output_tokens'),
               usage.get('cache_creation_input_tokens'), usage.get('cache_read_input_tokens'))
        if key != prev_usage_key:
            last_text_blocks = []
            prev_usage_key = key
        for block in msg.get('content', []):
            if block.get('type') == 'text':
                last_text_blocks.append(block['text'])

    return ''.join(last_text_blocks)

Skip thinking and tool_use blocks — they are not part of the human-visible response.

Race Condition: Stop Hook Fires Before JSONL Flush

The Stop hook fires approximately 2–5 ms before Claude Code flushes the final assistant event to the JSONL. If the transcript is read immediately on Stop:

The final API turn may be missing from the JSONL
Or the last chunk may be partially written (truncated JSON line)

Retry strategy:

import time

def read_with_retry(jsonl_path, max_retries=40, interval=0.05):
    for attempt in range(max_retries):
        text = extract_final_text(jsonl_path)
        if text:
            return text
        time.sleep(interval)
    return None  # use Stop hook payload fallback

40 × 50 ms = 2 s maximum wait. Observed: text available within 1–3 retries (50–150 ms after Stop fires).

Hook System

Available Hook Events

From ~/.claude/settings.json (observed on v2.1.168):

Hook event	When it fires	Stdin payload
`SessionStart`	Claude Code process starts	`{session_id, cwd, ...}`
`SessionEnd`	Process exits	`{session_id, ...}`
`Stop`	Assistant finishes a turn, waiting for next input	`{session_id, transcript_path, last_assistant_message, ...}`
`UserPromptSubmit`	User submits a new message	`{session_id, prompt, ...}`
`PreToolUse`	Before each tool call	`{session_id, tool_name, tool_input, ...}`
`PermissionRequest`	Before granting a permission	`{session_id, permission, ...}`

Stop Hook Payload

{
  "hook_event_name": "Stop",
  "session_id": "37f84004-275c-46fd-8947-54348867302a",
  "transcript_path": "/home/coding/.claude/projects/-home-coding-claude-print/37f84004-....jsonl",
  "last_assistant_message": "The final text of the last assistant turn",
  "cwd": "/home/coding/claude-print"
}

last_assistant_message is the extracted text of the final turn — available directly without reading the JSONL. Useful as a fallback when the JSONL isn't flushed yet and the retry loop is exhausted.

Hook Configuration

Hooks are configured in ~/.claude/settings.json (user-global), .claude/settings.json (project), or .claude/settings.local.json (local override). The --settings <path> flag specifies an additional settings file. Settings are merged; all matching hooks fire.

Per-run settings overlay (the claude-print approach):

{
  "hooks": {
    "Stop": [{
      "hooks": [{"type": "command", "command": "/tmp/claude-print-PID/hook.sh", "timeout": 10}]
    }]
  }
}

The hook script receives the JSON payload on stdin. Exit code is ignored by Claude Code (hooks are fire-and-forget). Timeout (seconds) aborts the hook process if it runs too long.

Existing Hooks on This Server

The following hooks are active in ~/.claude/settings.json and will fire for all claude sessions including subprocess ones:

PermissionRequest → trail-boss/trailboss-emit.sh
PreToolUse → ~/.ccdash/hooks/pre-tool-use.sh
SessionEnd → ~/.ccdash/hooks/session-end.sh + trailboss-emit.sh
SessionStart → ~/.ccdash/hooks/session-start.sh + trailboss-emit.sh
Stop → ~/.ccdash/hooks/stop.sh
UserPromptSubmit → (ccdash hook)

trailboss-emit.sh silently exits 0 if $TMUX_PANE is not set — subprocess sessions are unaffected. ccdash hooks update the session registry, which is correct behavior.

Retrieving Output from an Independent Session

Method 1: Stop Hook + JSONL (Primary)

The subprocess session fires the Stop hook when done. claude-print pre-installs an additional per-run hook via --settings overlay. The hook writes the Stop payload to a named FIFO. The parent reads the FIFO, gets transcript_path and last_assistant_message, then reads the JSONL for full text and token counts.

This is the most reliable method. Latency: Stop hook fires within 50–200 ms of the final token being generated.

Method 2: Session-ID Pre-assignment (`--session-id`)

Assign a known UUID to the subprocess session at spawn time:

import uuid
child_session_id = str(uuid.uuid4())
transcript_path = f'~/.claude/projects/{cwd_slug}/{child_session_id}.jsonl'

args = ['claude', '--session-id', child_session_id, '--dangerously-skip-permissions', ...]

The parent knows the JSONL path before the session starts. Can poll the file directly without waiting for a Stop hook payload. Combine with the Stop FIFO for reliable completion signaling.

Method 3: Resume (`--resume`) — Adding to the Main Session

After a subprocess session completes, its full history (user prompts + assistant responses) is in its JSONL. The main session (or any subsequent session) can incorporate it:

# Branch from the subprocess session's history
claude --resume <child-session-id> --fork-session

This creates a new session that has the subprocess session's entire conversation as its history. The user (or next automated prompt) continues from that point.

Alternatively: the calling session can read the subprocess session's final response and inject it as context in the next user turn. This avoids merging session histories but achieves the same goal.

Method 4: Structured Output Re-injection

claude-print emits a structured result object (--output-format json). The caller (e.g., NEEDLE) treats this as the final response. The caller's own session (the NEEDLE worker session) receives the result as a tool output. The subprocess session's token usage is reported in the structured result and can be forwarded to any accounting system.

This is how claude-print integrates with NEEDLE: NEEDLE's session sees the result as if it were a tool call output; the actual LLM work happened in the subprocess session billed separately.

Billing Classification

The entrypoint field is set in user events in the JSONL. Observed values: "cli" for interactive TUI sessions. The billing classification (cc_entrypoint header sent to the API) is determined by the process mode at startup — if claude has a real TTY (checked via isatty()), it enters TUI mode and uses cli. If stdout is a pipe, it uses sdk-cli.

Running under claude-print's PTY: isatty(slave_fd) returns true → TUI mode → cli billing. Running as claude -p: isatty(stdout) returns false → print mode → sdk-cli billing.

17 KiB Raw Permalink Blame History Unescape Escape