From 675c1e531db074c97b78df374b803fdd91038ff8 Mon Sep 17 00:00:00 2001 From: jedarden Date: Mon, 25 May 2026 15:18:56 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20gap-review=20round=201=20=E2=80=94=20re?= =?UTF-8?q?concile=20supporting=20docs=20+=20close=20spec=20gaps?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix 10 gaps found by fresh-eyes review: - CRITICAL: mechanics doc still documented send-keys-relay/Agent-SDK delivery and the Notification hook; rewrote DELIVER to the navigator model, dropped Notification throughout - HIGH: decisions.md + related-work.md still referenced priority "ranking", /reply dispatch, send-keys/SDK delivery, and the dead plan/question/idle reason taxonomy; aligned to FIFO + navigation + permission/stopped - collector and daemon stated as one process; pull-not-broadcast presentation - added trust boundary (loopback-only ingest, single-host trust assumption) - resolved auto-advance focus-steal hazard (operator-initiated jump) and specified skip re-ordering (move to tail + cooldown) to avoid livelock - downgraded SessionEnd to "not yet probed" (probe only saw SessionStart/Stop) - dropped forked collector's WebSocket layer; most-stuck → oldest-stuck (FIFO) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/notes/decisions.md | 3 +- docs/plan/plan.md | 60 +++++++++++----- docs/research/claude-code-mechanics.md | 98 ++++++++++++-------------- docs/research/related-work.md | 27 +++---- 4 files changed, 105 insertions(+), 83 deletions(-) diff --git a/docs/notes/decisions.md b/docs/notes/decisions.md index 52b5046..4870a89 100644 --- a/docs/notes/decisions.md +++ b/docs/notes/decisions.md @@ -11,7 +11,8 @@ it right. The metaphor maps cleanly onto the mechanism: - **a steer bogs down or strays** → a `Stop` / `PermissionRequest` hook fires; the collector flags the session stuck - **the trail boss rides over and sets it right** → you read the context and give the order - (reply) or wave it on (skip); ranking surfaces the most-stuck first + (reply) or wave it on (skip); the queue surfaces stuck sessions oldest-first (flat FIFO, no + priority ranking) ### Names considered and rejected diff --git a/docs/plan/plan.md b/docs/plan/plan.md index dc0532a..e238411 100644 --- a/docs/plan/plan.md +++ b/docs/plan/plan.md @@ -99,8 +99,11 @@ interaction sidesteps this entirely. 1. **A blocked-state signal from every session** — Claude Code hooks (`Stop`, `PermissionRequest`), POSTing their stdin JSON to a local collector. -2. **A central collector + live state store** — tracks every session's status, holds the - `session_id → pane` registry, broadcasts the queue. +2. **A central collector + live state store** — tracks every session's status and holds the + `session_id → pane` registry. The collector **is the daemon** (see Architecture); its HTTP + ingest endpoint is one face of that single process. Presentation reads current state on + demand (the `display-popup` *pulls*; there is no push/broadcast channel), so no WebSocket is + required. An optional status-line "N stuck" segment, if added, polls the daemon. 3. **Context extraction** — *what* each session is asking. Largely free from the hook payload (see below); transcript tail for deeper/permission context. 4. **The Trail Boss queue** — a FIFO depletion surface (oldest-stuck first), keyboard-driven. @@ -119,7 +122,8 @@ stuck conditions: a session waiting at a permission prompt is mid-turn and does | `Stop` | Turn finished; session waiting for the next instruction. | Confirmed firing in interactive and `-p` (probe 2026-05-25); payload carries `last_assistant_message` | | `PermissionRequest` | Session blocked mid-turn on approval — emits **no** `Stop`, so this is the only signal for the permission case. | Exists; firing/payload **not yet probed** (phase 1) | | `UserPromptSubmit` | Input submitted → session unstuck → **dequeue**. | Confirmed primitive | -| `SessionStart` / `SessionEnd` | Register / retire the session (and re-assert `session_id → pane`). | Confirmed firing (probe) | +| `SessionStart` | Register the session; re-assert `session_id → pane`. | Confirmed firing (probe) | +| `SessionEnd` | Retire the session. | Exists; firing not yet probed | | ~~`Notification`~~ | Dropped — `Stop` + `PermissionRequest` cover every stuck case; the dead-letter queue just fills and drains. | Not used | Both `Stop` and `PermissionRequest` enqueue a plain stuck item with no priority difference. @@ -165,8 +169,8 @@ the host provides every needed primitive (all confirmed present): `switch-client`, `select-window`, `select-pane`, `link-window`/`unlink-window`, `join-pane`/`break-pane`, `pipe-pane`, `capture-pane`, `display-popup`. -- **Minimal (recommended start):** route the operator's client to the most-stuck pane — - `switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual +- **Minimal (recommended start):** route the operator's client to the head-of-queue + (oldest-stuck) pane — `switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual tab-switching," with zero relay. - **Embedded (optional polish):** `link-window` the target session's window into a Trail Boss view so the queue and live pane are co-visible, then `unlink-window`. Non-destructive (tmux @@ -219,7 +223,7 @@ the live pane. No special edit affordance or `canUseTool` round-trip is required display-popup (queue overlay) ──select──▶ switch-client / + optional status-line "N stuck" select-window/pane → you land on the - live most-stuck pane + live head-of-queue pane ``` ### Daemon vs. presentation split @@ -288,12 +292,18 @@ loads. - **Membership:** every stuck session (from `Stop` or `PermissionRequest`). No priority between reasons; `reason` is display-only. Reconcile removes any that have progressed. - **Order:** oldest-stuck first (FIFO). The head of the queue is "the current session." -- **Auto-advance:** the operator's focus is navigated to the current session. When that session - resolves — `UserPromptSubmit` fires (you responded) or you `skip` — Trail Boss **loads the - next** stuck session into focus. The operator drains the queue one session at a time without - ever choosing "which next." -- **Skip:** advances to the next without acting; the skipped session stays in the queue and - re-surfaces later in the cycle. +- **Auto-advance (no forced focus-steal):** the operator works the *current* session in its + live pane. When that session resolves — `UserPromptSubmit` fires (you responded) or you + `skip` — the daemon **computes** the next head of queue, but the actual jump is + **operator-initiated** (a keypress / re-invoking the popup), never an automatic + `switch-client`. This matters because responding *is* what fires `UserPromptSubmit`: a forced + jump would teleport you out of the pane the instant you hit Enter. So depletion is "resolve → + next is ready → you press to advance," not a focus-steal. (Whether to offer an opt-in + "auto-jump on resolve" toggle is deferred.) +- **Skip:** advances to the next without acting. The skipped session is **moved to the tail** of + the FIFO (and stamped with a short re-surface cooldown) so depletion always progresses and a + single item can't be re-selected immediately. In a one-item queue, skip lands you on empty; + the item re-surfaces on its next event or after the cooldown. - **Dequeue:** transcript advances past the last stuck point, or `UserPromptSubmit` fires, or `SessionEnd`. - **Empty queue:** nothing stuck → no auto-advance; the operator is free. New stuck sessions @@ -335,8 +345,8 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p 3. **Daemon (control plane)** — ingest endpoint behind the normalized stuck/unstuck adapter contract, SQLite state, self-healing registry, the transcript reconcile loop, FIFO queue. Runs in its own tmux window. -4. **Navigation** — `switch-client`/`select-window`/`select-pane` to route to a pane by id, and - auto-advance to the next stuck session on resolve/skip. +4. **Navigation** — `switch-client`/`select-window`/`select-pane` to route to a pane by id; + compute the next head on resolve/skip and jump on operator action (no forced focus-steal). 5. **Presentation** — `display-popup` queue overlay + keybinding; optional status-line segment. 6. **Close the loop (walking skeleton)** — stuck pane → loaded into focus → interact → reconcile dequeues it → next stuck session auto-loads. This end-to-end depletion path is the first @@ -353,6 +363,20 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p - *The queue never lies:* a displayed item reflects current transcript state (reconcile is authoritative over hook events). +**Trust boundary** +- The collector/daemon binds its ingest endpoint to **loopback only** (`127.0.0.1:4000`) — it is + never exposed off-host. Per the single-operator/single-host non-goal, **all local processes are + trusted**: hook POSTs are unauthenticated and the `session_id` / `$TMUX_PANE` they carry are + taken on trust. +- This is acceptable on a single-user host. On a **multi-user host** it is not: any local process + could POST a forged event with an attacker-chosen `$TMUX_PANE`, causing the daemon to navigate + the operator to — or, if the optional `send-keys` path is enabled, type into — an arbitrary + pane. Mitigations if that ever matters: a unix-domain socket with file-mode restrictions, or a + shared secret in the hook POSTs. +- The optional `send-keys` delivery path is the only way a forged event could inject + *keystrokes*; with it disabled (navigation-only, the default), a forged event can at worst + mis-route the operator's focus — annoying, not destructive. + **Failure modes** - *Hook POST dropped (collector down/slow):* hook exits 0, event lost → reconcile sweep recovers it from transcripts. @@ -376,9 +400,11 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p harnesses? See "Layering" above. 2. **`PermissionRequest` specifics** — confirm it fires for the gate types you hit and what its payload carries (the proposed command, for display). Detection coverage depends on it; phase 1. -3. **Auto-advance trigger** — exactly what counts as "done with the current session" → - load next: `UserPromptSubmit` and explicit `skip` are clear; should manually navigating away - also advance? And is the jump immediate or on a keypress? +3. **Auto-advance residuals** — the trigger and jump model are decided (resolve via + `UserPromptSubmit`/`skip` → next is computed → operator-initiated jump, never a forced + focus-steal; see "Queue & interaction loop"). Residual: should manually navigating away from + the current pane also count as advancing, and is an opt-in "auto-jump on resolve" toggle + worth offering? Decide after the walking skeleton. 4. **Presentation UX** — `display-popup` queue + jump vs. a dedicated always-visible window; decide after the walking skeleton. diff --git a/docs/research/claude-code-mechanics.md b/docs/research/claude-code-mechanics.md index d4a835a..fbf4d2a 100644 --- a/docs/research/claude-code-mechanics.md +++ b/docs/research/claude-code-mechanics.md @@ -19,7 +19,6 @@ the collector is enough: { "hooks": [ { "type": "command", "command": "curl -s -X POST http://localhost:4000/event --data-binary @-" } ] } ], - "Notification": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ], "Stop": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ], "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ], "SessionStart": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-register.sh" } ] } ], @@ -32,21 +31,20 @@ Events relevant to "needs a human": | Event | Meaning for the queue | Confidence | |-------|----------------------|------------| -| `Stop` | **Turn finished; session idle, waiting for the next prompt.** The reliable idle signal — fires every time a turn completes. Enqueue "ready for next." | Confirmed | -| `PermissionRequest` | Hard block: a tool wants approval. Enqueue allow/deny. | Confirmed exists | -| `Notification` | Claude notifying the user (permission prompt, long-idle nudge). Supplementary; **trigger conditions less documented.** | **(verify, optional)** | +| `Stop` | **Turn finished; session waiting for the next instruction.** Enqueue a stuck item. | Confirmed firing (probe) | +| `PermissionRequest` | Session blocked mid-turn on approval. Emits **no** `Stop`, so it is the only signal for the permission case. Enqueue a stuck item. | Exists; firing/payload not yet probed | | `SubagentStop` | A subagent finished — usually *not* a human-input point; ignore. | Confirmed | | `UserPromptSubmit` | Human submitted input → block resolved → **dequeue**. | Confirmed | -| `SessionStart` / `SessionEnd` | Register / retire the session. | Confirmed | +| `SessionStart` | Register the session; capture `$TMUX_PANE`. | Confirmed firing (probe) | +| `SessionEnd` | Retire the session. | Exists; firing not yet probed | | `PreToolUse` / `PostToolUse` | Activity telemetry (the "running" state), not blocks. | Confirmed | -> **Detection model (settled):** the two load-bearing signals are `Stop` (idle, waiting for the -> next prompt) and `PermissionRequest` (a hard block). `Stop` makes the "wants the next -> instruction" case free — no polling. `Notification` is supplementary; the only open question -> is whether it surfaces anything those two miss (e.g., a long-idle nudge), and if not it can -> be dropped. Optional probe: log `Notification` alongside `Stop`/`PermissionRequest` across -> (a) a permission prompt, (b) plan-mode approval, (c) a clarifying question, (d) a finished -> turn at an empty prompt, and see if it ever adds signal. +> **Detection model (settled):** the two enqueue triggers are `Stop` (turn finished, waiting) +> and `PermissionRequest` (blocked mid-turn). **Both are required** — a permission-blocked +> session is mid-turn and emits no `Stop`, so without `PermissionRequest` it would never be +> detected. They are treated identically (a flat stuck item; `reason` is display-only, never a +> priority). `Notification` was evaluated and **dropped** — `Stop` + `PermissionRequest` cover +> every stuck case. See `../plan/plan.md` ("Detection model" and the resolved-questions list). --- @@ -108,44 +106,41 @@ prompt. Cheap; poll on demand. --- -## 4. DELIVER — get the reply back into the exact session +## 4. DELIVER — route the operator to the live session (navigation, not relay) -### Substrate A — tmux `send-keys` (overlays a terminal workflow) +Trail Boss does **not** inject answers. It navigates the operator to the live pane, where they +interact with the real prompt directly (see `../plan/plan.md`, "Navigator, not relay" and the +delivery decision in `../notes/decisions.md`). The relevant primitives, all tmux-server-global +so they work from outside tmux: ```bash -# free-text reply (a clarifying answer or the next instruction) -tmux send-keys -t -l 'shard by tenant; tenants are the hard isolation boundary' -tmux send-keys -t Enter - -# permission menu choice (send the option key the prompt expects) -tmux send-keys -t '1' # or 'y' / arrow+Enter, depending on the prompt +# bring the operator's client to the stuck pane (pane ids like %446 are global) +tmux switch-client -t "$(tmux display -p -t %446 '#{session_name}')" +tmux select-window -t %446 +tmux select-pane -t %446 +# optional co-display: link the target window into a Trail Boss view, then unlink +tmux link-window -s : -t trailboss: ; tmux unlink-window -t trailboss: ``` -- `-l` sends the argument literally; send `Enter` separately. -- Requires the `session_id → pane` mapping from the `SessionStart` hook. -- **Verify** multi-line/paste fidelity and how each prompt type consumes keystrokes. -- Works regardless of how the session was launched — it stays a normal interactive session. +- Primary delivery is **navigation** — the operator types into the genuine CLI, so there is no + keystroke-fidelity problem and "edit before allow" is native. +- **Secondary (optional):** `tmux send-keys -t %446 -l ''` then `send-keys -t %446 Enter` + for plain-text submission (basic submission confirmed in the probe). Not the primary path; the + daemon never sends *synthesized* input — only human-authored text, and only if this path is + enabled. -### Substrate B — Agent SDK (the clean rewrite) +### Rejected delivery alternatives -If sessions run under the Python/TypeScript Agent SDK: - -- **`canUseTool` callback** (`can_use_tool` in Python) — fires for any tool needing approval and - for clarifying questions. It can **await indefinitely** while the orchestrator collects your - remote decision, then returns `{ behavior: "allow" | "deny", updatedInput?, message? }` — - including a *modified* tool input. The cleanest "answer from elsewhere" hook: execution pauses - until you respond. -- **Streaming input** — the SDK accepts an async-iterable prompt stream, so follow-up turns are - delivered programmatically over a long-lived channel. -- **Resume** — `resume: ` to continue a specific session. - -### Why headless `claude -p` is NOT a delivery path - -`claude -p` (print/headless) with `--output-format stream-json` / `--input-format stream-json` -is **one-shot**: it cannot stream additional user messages into an already-running session, and -interactive permission prompting is unavailable (you must pre-authorize with `--allowedTools` -or wire `--permission-prompt-tool `, whose payload schema is undocumented — -**(verify)**). Use the SDK, not `-p`, for Substrate B. +- **Resume-to-deliver** (`claude --resume ` in a second process) does **not** reach the + original live pane — a live interactive CLI holds in-memory state and does not re-read its + transcript; concurrent attach risks divergence. `--fork-session` confirms `--resume` reuses + the session. Only viable in a no-resident-process model, which is rejected for v1. +- **Agent SDK `canUseTool` + streaming input** would allow programmatic permission gating with + `updatedInput`, but requires running sessions under the SDK instead of the terminal — + deferred; the tmux-navigator model fits the existing workflow. +- **`claude --remote-control`** routes to the claude.ai / desktop / mobile surface, not a local + channel — useless for a same-host tool. +- **Headless `claude -p`** is one-shot and cannot stream input into a running session. --- @@ -153,13 +148,12 @@ or wire `--permission-prompt-tool `, whose payload schema is undocumen | Need | Primitive | Identifier / flag | Confidence | |------|-----------|-------------------|------------| -| Detect idle / waiting for next prompt | `Stop` hook | stdin JSON | confirmed (primary idle signal) | -| Detect permission block | `PermissionRequest` hook | stdin JSON | confirmed | -| Detect long-idle nudge (optional) | `Notification` hook | stdin JSON | **(verify, optional)** | +| Detect waiting for next instruction | `Stop` hook | stdin JSON | confirmed firing (probe) | +| Detect permission block | `PermissionRequest` hook | stdin JSON | exists; not yet probed | | Detect resolved block | `UserPromptSubmit` hook | `session_id` | confirmed | -| Correlate to session/repo | any hook payload | `session_id`, `cwd`, `transcript_path` | confirmed | -| Correlate to tmux pane | `SessionStart` hook | `$TMUX_PANE` (capture in script) | confirmed (you wire it) | -| Read the question | transcript JSONL / `capture-pane` | `transcript_path` / pane | confirmed | -| Deliver reply (overlay) | tmux `send-keys -t ` | pane id | confirmed | -| Deliver reply (rewrite) | Agent SDK `canUseTool` + streaming input | `session_id` | confirmed (SDK only) | -| Deliver reply (headless) | — | not viable (`-p` is one-shot) | confirmed limitation | +| Correlate to session/repo | any hook payload + env | `session_id`, `cwd`, `transcript_path`, `CLAUDE_CODE_SESSION_ID` | confirmed (probe) | +| Correlate to tmux pane | any emit hook | `$TMUX_PANE` (in hook env) | confirmed (probe) | +| Read the question | `Stop` payload `last_assistant_message` / transcript / `capture-pane` | payload / `transcript_path` / pane | confirmed (probe) | +| Deliver (primary) | tmux navigation (`switch-client`/`select-window`/`select-pane`) | pane id | confirmed primitives | +| Deliver (secondary, optional) | tmux `send-keys -t ` (human-authored text only) | pane id | basic submission confirmed | +| Rejected: resume / SDK / remote-control / `-p` | — | — | see "Rejected" above | diff --git a/docs/research/related-work.md b/docs/research/related-work.md index 1b6285e..cc9a4a8 100644 --- a/docs/research/related-work.md +++ b/docs/research/related-work.md @@ -13,10 +13,11 @@ activity (sessions, tool calls, errors) over WebSocket. It proves the **detectio layer** Trail Boss needs: hooks → collector → live UI, keyed by `session_id`. **How Trail Boss differs:** it is *observability*, not *action*. It shows *that* a session is -waiting; Trail Boss adds the missing half — surfacing the blocked session as an actionable -queue item and **delivering your reply back into the exact session**. Its collector is a strong -starting point to fork; Trail Boss extends the read-only event store with a session→pane -registry and a `/reply` dispatch path. +waiting; Trail Boss adds the missing half — surfacing the blocked session and **navigating the +operator to the live pane** to act on it (it never injects input; see "Navigator, not relay" in +the plan). Its collector is a strong starting point to fork; Trail Boss extends the read-only +event store with a self-healing session→pane registry and a tmux navigation layer. It reuses the +SQLite event store, not the WebSocket/browser-UI layer (presentation is tmux, not a web app). ### [`langchain-ai/agent-inbox`](https://github.com/langchain-ai/agent-inbox) @@ -26,10 +27,10 @@ public analog to Trail Boss's core idea — a **queue of agents waiting on a hum **How Trail Boss differs:** Agent Inbox is bound to the LangGraph runtime and its `interrupt` primitive. Trail Boss targets **interactive terminal coding agents** (e.g. Claude Code -sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and -delivery is via **tmux `send-keys`** (overlaying a real terminal workflow with no rewrite) or -the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** semantics and a -**most-stuck-first** ranking tuned for an operator triaging many live sessions at once. +sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and the +operator acts by being **navigated to the live tmux pane** — not by the tool replying through a +graph runtime. It also adds **skip / re-surface** semantics and an **auto-advance FIFO depletion +loop** (oldest-stuck first, no priority) tuned for an operator draining many live sessions. ## Positioning @@ -41,8 +42,8 @@ the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** se ## Concrete reuse - **Collector backend** → the hook-native event store from `disler/...observability` - (WebSocket + SQLite) is a clean fork point; extend it with the session→pane registry and a - `/reply` endpoint. -- **Inbox UX patterns** → `langchain-ai/agent-inbox` for the accept/edit/respond/ignore - interaction model, mapped onto Trail Boss's item reasons (`permission`, `plan`, `question`, - `idle`). + (SQLite + hook ingest) is a clean fork point; extend it with the self-healing session→pane + registry and the tmux navigation layer. Drop its WebSocket/browser-UI half — Trail Boss's + presentation is tmux (`display-popup`), not a web client. +- **Inbox UX patterns** → `langchain-ai/agent-inbox` for the accept/respond/ignore interaction + model, mapped onto Trail Boss's two stuck reasons (`permission`, `stopped`).