docs: gap-review round 1 — reconcile supporting docs + close spec gaps

Fix 10 gaps found by fresh-eyes review: - CRITICAL: mechanics doc still documented send-keys-relay/Agent-SDK delivery and the Notification hook; rewrote DELIVER to the navigator model, dropped Notification throughout - HIGH: decisions.md + related-work.md still referenced priority "ranking", /reply dispatch, send-keys/SDK delivery, and the dead plan/question/idle reason taxonomy; aligned to FIFO + navigation + permission/stopped - collector and daemon stated as one process; pull-not-broadcast presentation - added trust boundary (loopback-only ingest, single-host trust assumption) - resolved auto-advance focus-steal hazard (operator-initiated jump) and specified skip re-ordering (move to tail + cooldown) to avoid livelock - downgraded SessionEnd to "not yet probed" (probe only saw SessionStart/Stop) - dropped forked collector's WebSocket layer; most-stuck → oldest-stuck (FIFO) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 15:18:56 -04:00 · 2026-05-25 15:18:56 -04:00 · 675c1e531d
commit 675c1e531d
parent ca08cbbead
4 changed files with 105 additions and 83 deletions
--- a/docs/notes/decisions.md
+++ b/docs/notes/decisions.md
@ -11,7 +11,8 @@ it right. The metaphor maps cleanly onto the mechanism:
 - **a steer bogs down or strays** → a `Stop` / `PermissionRequest` hook fires; the collector
  flags the session stuck
 - **the trail boss rides over and sets it right** → you read the context and give the order
-  (reply) or wave it on (skip); ranking surfaces the most-stuck first
+  (reply) or wave it on (skip); the queue surfaces stuck sessions oldest-first (flat FIFO, no
+  priority ranking)

 ### Names considered and rejected

--- a/docs/plan/plan.md
+++ b/docs/plan/plan.md
@ -99,8 +99,11 @@ interaction sidesteps this entirely.

 1. **A blocked-state signal from every session** — Claude Code hooks (`Stop`,
   `PermissionRequest`), POSTing their stdin JSON to a local collector.
-2. **A central collector + live state store** — tracks every session's status, holds the
-   `session_id → pane` registry, broadcasts the queue.
+2. **A central collector + live state store** — tracks every session's status and holds the
+   `session_id → pane` registry. The collector **is the daemon** (see Architecture); its HTTP
+   ingest endpoint is one face of that single process. Presentation reads current state on
+   demand (the `display-popup` *pulls*; there is no push/broadcast channel), so no WebSocket is
+   required. An optional status-line "N stuck" segment, if added, polls the daemon.
 3. **Context extraction** — *what* each session is asking. Largely free from the hook payload
   (see below); transcript tail for deeper/permission context.
 4. **The Trail Boss queue** — a FIFO depletion surface (oldest-stuck first), keyboard-driven.
@ -119,7 +122,8 @@ stuck conditions: a session waiting at a permission prompt is mid-turn and does
 | `Stop` | Turn finished; session waiting for the next instruction. | Confirmed firing in interactive and `-p` (probe 2026-05-25); payload carries `last_assistant_message` |
 | `PermissionRequest` | Session blocked mid-turn on approval — emits **no** `Stop`, so this is the only signal for the permission case. | Exists; firing/payload **not yet probed** (phase 1) |
 | `UserPromptSubmit` | Input submitted → session unstuck → **dequeue**. | Confirmed primitive |
-| `SessionStart` / `SessionEnd` | Register / retire the session (and re-assert `session_id → pane`). | Confirmed firing (probe) |
+| `SessionStart` | Register the session; re-assert `session_id → pane`. | Confirmed firing (probe) |
+| `SessionEnd` | Retire the session. | Exists; firing not yet probed |
 | ~~`Notification`~~ | Dropped — `Stop` + `PermissionRequest` cover every stuck case; the dead-letter queue just fills and drains. | Not used |

 Both `Stop` and `PermissionRequest` enqueue a plain stuck item with no priority difference.
@ -165,8 +169,8 @@ the host provides every needed primitive (all confirmed present):
 `switch-client`, `select-window`, `select-pane`, `link-window`/`unlink-window`,
 `join-pane`/`break-pane`, `pipe-pane`, `capture-pane`, `display-popup`.

- **Minimal (recommended start):** route the operator's client to the most-stuck pane —
-  `switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual
+- **Minimal (recommended start):** route the operator's client to the head-of-queue
+  (oldest-stuck) pane — `switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual
  tab-switching," with zero relay.
 - **Embedded (optional polish):** `link-window` the target session's window into a Trail Boss
  view so the queue and live pane are co-visible, then `unlink-window`. Non-destructive (tmux
@ -219,7 +223,7 @@ the live pane. No special edit affordance or `canUseTool` round-trip is required
           display-popup (queue overlay)  ──select──▶  switch-client /
           + optional status-line "N stuck"            select-window/pane
                                                        → you land on the
-                                                          live most-stuck pane
+                                                          live head-of-queue pane
 ```

 ### Daemon vs. presentation split
@ -288,12 +292,18 @@ loads.
 - **Membership:** every stuck session (from `Stop` or `PermissionRequest`). No priority between
  reasons; `reason` is display-only. Reconcile removes any that have progressed.
 - **Order:** oldest-stuck first (FIFO). The head of the queue is "the current session."
- **Auto-advance:** the operator's focus is navigated to the current session. When that session
-  resolves — `UserPromptSubmit` fires (you responded) or you `skip` — Trail Boss **loads the
-  next** stuck session into focus. The operator drains the queue one session at a time without
-  ever choosing "which next."
- **Skip:** advances to the next without acting; the skipped session stays in the queue and
-  re-surfaces later in the cycle.
+- **Auto-advance (no forced focus-steal):** the operator works the *current* session in its
+  live pane. When that session resolves — `UserPromptSubmit` fires (you responded) or you
+  `skip` — the daemon **computes** the next head of queue, but the actual jump is
+  **operator-initiated** (a keypress / re-invoking the popup), never an automatic
+  `switch-client`. This matters because responding *is* what fires `UserPromptSubmit`: a forced
+  jump would teleport you out of the pane the instant you hit Enter. So depletion is "resolve →
+  next is ready → you press to advance," not a focus-steal. (Whether to offer an opt-in
+  "auto-jump on resolve" toggle is deferred.)
+- **Skip:** advances to the next without acting. The skipped session is **moved to the tail** of
+  the FIFO (and stamped with a short re-surface cooldown) so depletion always progresses and a
+  single item can't be re-selected immediately. In a one-item queue, skip lands you on empty;
+  the item re-surfaces on its next event or after the cooldown.
 - **Dequeue:** transcript advances past the last stuck point, or `UserPromptSubmit` fires, or
  `SessionEnd`.
 - **Empty queue:** nothing stuck → no auto-advance; the operator is free. New stuck sessions
@ -335,8 +345,8 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
 3. **Daemon (control plane)** — ingest endpoint behind the normalized stuck/unstuck adapter
   contract, SQLite state, self-healing registry, the transcript reconcile loop, FIFO queue.
   Runs in its own tmux window.
-4. **Navigation** — `switch-client`/`select-window`/`select-pane` to route to a pane by id, and
-   auto-advance to the next stuck session on resolve/skip.
+4. **Navigation** — `switch-client`/`select-window`/`select-pane` to route to a pane by id;
+   compute the next head on resolve/skip and jump on operator action (no forced focus-steal).
 5. **Presentation** — `display-popup` queue overlay + keybinding; optional status-line segment.
 6. **Close the loop (walking skeleton)** — stuck pane → loaded into focus → interact → reconcile
   dequeues it → next stuck session auto-loads. This end-to-end depletion path is the first
@ -353,6 +363,20 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
 - *The queue never lies:* a displayed item reflects current transcript state (reconcile is
  authoritative over hook events).

+**Trust boundary**
+- The collector/daemon binds its ingest endpoint to **loopback only** (`127.0.0.1:4000`) — it is
+  never exposed off-host. Per the single-operator/single-host non-goal, **all local processes are
+  trusted**: hook POSTs are unauthenticated and the `session_id` / `$TMUX_PANE` they carry are
+  taken on trust.
+- This is acceptable on a single-user host. On a **multi-user host** it is not: any local process
+  could POST a forged event with an attacker-chosen `$TMUX_PANE`, causing the daemon to navigate
+  the operator to — or, if the optional `send-keys` path is enabled, type into — an arbitrary
+  pane. Mitigations if that ever matters: a unix-domain socket with file-mode restrictions, or a
+  shared secret in the hook POSTs.
+- The optional `send-keys` delivery path is the only way a forged event could inject
+  *keystrokes*; with it disabled (navigation-only, the default), a forged event can at worst
+  mis-route the operator's focus — annoying, not destructive.
+
 **Failure modes**
 - *Hook POST dropped (collector down/slow):* hook exits 0, event lost → reconcile sweep
  recovers it from transcripts.
@ -376,9 +400,11 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
   harnesses? See "Layering" above.
 2. **`PermissionRequest` specifics** — confirm it fires for the gate types you hit and what its
   payload carries (the proposed command, for display). Detection coverage depends on it; phase 1.
-3. **Auto-advance trigger** — exactly what counts as "done with the current session" →
-   load next: `UserPromptSubmit` and explicit `skip` are clear; should manually navigating away
-   also advance? And is the jump immediate or on a keypress?
+3. **Auto-advance residuals** — the trigger and jump model are decided (resolve via
+   `UserPromptSubmit`/`skip` → next is computed → operator-initiated jump, never a forced
+   focus-steal; see "Queue & interaction loop"). Residual: should manually navigating away from
+   the current pane also count as advancing, and is an opt-in "auto-jump on resolve" toggle
+   worth offering? Decide after the walking skeleton.
 4. **Presentation UX** — `display-popup` queue + jump vs. a dedicated always-visible window;
   decide after the walking skeleton.

--- a/docs/research/claude-code-mechanics.md
+++ b/docs/research/claude-code-mechanics.md
@ -19,7 +19,6 @@ the collector is enough:
      { "hooks": [ { "type": "command",
                     "command": "curl -s -X POST http://localhost:4000/event --data-binary @-" } ] }
    ],
-    "Notification":      [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
    "Stop":              [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
    "UserPromptSubmit":  [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
    "SessionStart":      [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-register.sh" } ] } ],
@ -32,21 +31,20 @@ Events relevant to "needs a human":

 | Event | Meaning for the queue | Confidence |
 |-------|----------------------|------------|
-| `Stop` | **Turn finished; session idle, waiting for the next prompt.** The reliable idle signal — fires every time a turn completes. Enqueue "ready for next." | Confirmed |
-| `PermissionRequest` | Hard block: a tool wants approval. Enqueue allow/deny. | Confirmed exists |
-| `Notification` | Claude notifying the user (permission prompt, long-idle nudge). Supplementary; **trigger conditions less documented.** | **(verify, optional)** |
+| `Stop` | **Turn finished; session waiting for the next instruction.** Enqueue a stuck item. | Confirmed firing (probe) |
+| `PermissionRequest` | Session blocked mid-turn on approval. Emits **no** `Stop`, so it is the only signal for the permission case. Enqueue a stuck item. | Exists; firing/payload not yet probed |
 | `SubagentStop` | A subagent finished — usually *not* a human-input point; ignore. | Confirmed |
 | `UserPromptSubmit` | Human submitted input → block resolved → **dequeue**. | Confirmed |
-| `SessionStart` / `SessionEnd` | Register / retire the session. | Confirmed |
+| `SessionStart` | Register the session; capture `$TMUX_PANE`. | Confirmed firing (probe) |
+| `SessionEnd` | Retire the session. | Exists; firing not yet probed |
 | `PreToolUse` / `PostToolUse` | Activity telemetry (the "running" state), not blocks. | Confirmed |

-> **Detection model (settled):** the two load-bearing signals are `Stop` (idle, waiting for the
-> next prompt) and `PermissionRequest` (a hard block). `Stop` makes the "wants the next
-> instruction" case free — no polling. `Notification` is supplementary; the only open question
-> is whether it surfaces anything those two miss (e.g., a long-idle nudge), and if not it can
-> be dropped. Optional probe: log `Notification` alongside `Stop`/`PermissionRequest` across
-> (a) a permission prompt, (b) plan-mode approval, (c) a clarifying question, (d) a finished
-> turn at an empty prompt, and see if it ever adds signal.
+> **Detection model (settled):** the two enqueue triggers are `Stop` (turn finished, waiting)
+> and `PermissionRequest` (blocked mid-turn). **Both are required** — a permission-blocked
+> session is mid-turn and emits no `Stop`, so without `PermissionRequest` it would never be
+> detected. They are treated identically (a flat stuck item; `reason` is display-only, never a
+> priority). `Notification` was evaluated and **dropped** — `Stop` + `PermissionRequest` cover
+> every stuck case. See `../plan/plan.md` ("Detection model" and the resolved-questions list).

 ---

@ -108,44 +106,41 @@ prompt. Cheap; poll on demand.

 ---

-## 4. DELIVER — get the reply back into the exact session
+## 4. DELIVER — route the operator to the live session (navigation, not relay)

-### Substrate A — tmux `send-keys` (overlays a terminal workflow)
+Trail Boss does **not** inject answers. It navigates the operator to the live pane, where they
+interact with the real prompt directly (see `../plan/plan.md`, "Navigator, not relay" and the
+delivery decision in `../notes/decisions.md`). The relevant primitives, all tmux-server-global
+so they work from outside tmux:

 ```bash
-# free-text reply (a clarifying answer or the next instruction)
-tmux send-keys -t <pane> -l 'shard by tenant; tenants are the hard isolation boundary'
-tmux send-keys -t <pane> Enter
-
-# permission menu choice (send the option key the prompt expects)
-tmux send-keys -t <pane> '1'           # or 'y' / arrow+Enter, depending on the prompt
+# bring the operator's client to the stuck pane (pane ids like %446 are global)
+tmux switch-client -t "$(tmux display -p -t %446 '#{session_name}')"
+tmux select-window -t %446
+tmux select-pane   -t %446
+# optional co-display: link the target window into a Trail Boss view, then unlink
+tmux link-window -s <src-session>:<window> -t trailboss: ; tmux unlink-window -t trailboss:<n>
 ```

- `-l` sends the argument literally; send `Enter` separately.
- Requires the `session_id → pane` mapping from the `SessionStart` hook.
- **Verify** multi-line/paste fidelity and how each prompt type consumes keystrokes.
- Works regardless of how the session was launched — it stays a normal interactive session.
+- Primary delivery is **navigation** — the operator types into the genuine CLI, so there is no
+  keystroke-fidelity problem and "edit before allow" is native.
+- **Secondary (optional):** `tmux send-keys -t %446 -l '<text>'` then `send-keys -t %446 Enter`
+  for plain-text submission (basic submission confirmed in the probe). Not the primary path; the
+  daemon never sends *synthesized* input — only human-authored text, and only if this path is
+  enabled.

-### Substrate B — Agent SDK (the clean rewrite)
+### Rejected delivery alternatives

-If sessions run under the Python/TypeScript Agent SDK:
-
- **`canUseTool` callback** (`can_use_tool` in Python) — fires for any tool needing approval and
-  for clarifying questions. It can **await indefinitely** while the orchestrator collects your
-  remote decision, then returns `{ behavior: "allow" | "deny", updatedInput?, message? }` —
-  including a *modified* tool input. The cleanest "answer from elsewhere" hook: execution pauses
-  until you respond.
- **Streaming input** — the SDK accepts an async-iterable prompt stream, so follow-up turns are
-  delivered programmatically over a long-lived channel.
- **Resume** — `resume: <session_id>` to continue a specific session.
-
-### Why headless `claude -p` is NOT a delivery path
-
-`claude -p` (print/headless) with `--output-format stream-json` / `--input-format stream-json`
-is **one-shot**: it cannot stream additional user messages into an already-running session, and
-interactive permission prompting is unavailable (you must pre-authorize with `--allowedTools`
-or wire `--permission-prompt-tool <mcp_tool>`, whose payload schema is undocumented —
-**(verify)**). Use the SDK, not `-p`, for Substrate B.
+- **Resume-to-deliver** (`claude --resume <id>` in a second process) does **not** reach the
+  original live pane — a live interactive CLI holds in-memory state and does not re-read its
+  transcript; concurrent attach risks divergence. `--fork-session` confirms `--resume` reuses
+  the session. Only viable in a no-resident-process model, which is rejected for v1.
+- **Agent SDK `canUseTool` + streaming input** would allow programmatic permission gating with
+  `updatedInput`, but requires running sessions under the SDK instead of the terminal —
+  deferred; the tmux-navigator model fits the existing workflow.
+- **`claude --remote-control`** routes to the claude.ai / desktop / mobile surface, not a local
+  channel — useless for a same-host tool.
+- **Headless `claude -p`** is one-shot and cannot stream input into a running session.

 ---

@ -153,13 +148,12 @@ or wire `--permission-prompt-tool <mcp_tool>`, whose payload schema is undocumen

 | Need | Primitive | Identifier / flag | Confidence |
 |------|-----------|-------------------|------------|
-| Detect idle / waiting for next prompt | `Stop` hook | stdin JSON | confirmed (primary idle signal) |
-| Detect permission block | `PermissionRequest` hook | stdin JSON | confirmed |
-| Detect long-idle nudge (optional) | `Notification` hook | stdin JSON | **(verify, optional)** |
+| Detect waiting for next instruction | `Stop` hook | stdin JSON | confirmed firing (probe) |
+| Detect permission block | `PermissionRequest` hook | stdin JSON | exists; not yet probed |
 | Detect resolved block | `UserPromptSubmit` hook | `session_id` | confirmed |
-| Correlate to session/repo | any hook payload | `session_id`, `cwd`, `transcript_path` | confirmed |
-| Correlate to tmux pane | `SessionStart` hook | `$TMUX_PANE` (capture in script) | confirmed (you wire it) |
-| Read the question | transcript JSONL / `capture-pane` | `transcript_path` / pane | confirmed |
-| Deliver reply (overlay) | tmux `send-keys -t <pane>` | pane id | confirmed |
-| Deliver reply (rewrite) | Agent SDK `canUseTool` + streaming input | `session_id` | confirmed (SDK only) |
-| Deliver reply (headless) | — | not viable (`-p` is one-shot) | confirmed limitation |
+| Correlate to session/repo | any hook payload + env | `session_id`, `cwd`, `transcript_path`, `CLAUDE_CODE_SESSION_ID` | confirmed (probe) |
+| Correlate to tmux pane | any emit hook | `$TMUX_PANE` (in hook env) | confirmed (probe) |
+| Read the question | `Stop` payload `last_assistant_message` / transcript / `capture-pane` | payload / `transcript_path` / pane | confirmed (probe) |
+| Deliver (primary) | tmux navigation (`switch-client`/`select-window`/`select-pane`) | pane id | confirmed primitives |
+| Deliver (secondary, optional) | tmux `send-keys -t <pane>` (human-authored text only) | pane id | basic submission confirmed |
+| Rejected: resume / SDK / remote-control / `-p` | — | — | see "Rejected" above |
--- a/docs/research/related-work.md
+++ b/docs/research/related-work.md
@ -13,10 +13,11 @@ activity (sessions, tool calls, errors) over WebSocket. It proves the **detectio
 layer** Trail Boss needs: hooks → collector → live UI, keyed by `session_id`.

 **How Trail Boss differs:** it is *observability*, not *action*. It shows *that* a session is
-waiting; Trail Boss adds the missing half — surfacing the blocked session as an actionable
-queue item and **delivering your reply back into the exact session**. Its collector is a strong
-starting point to fork; Trail Boss extends the read-only event store with a session→pane
-registry and a `/reply` dispatch path.
+waiting; Trail Boss adds the missing half — surfacing the blocked session and **navigating the
+operator to the live pane** to act on it (it never injects input; see "Navigator, not relay" in
+the plan). Its collector is a strong starting point to fork; Trail Boss extends the read-only
+event store with a self-healing session→pane registry and a tmux navigation layer. It reuses the
+SQLite event store, not the WebSocket/browser-UI layer (presentation is tmux, not a web app).

 ### [`langchain-ai/agent-inbox`](https://github.com/langchain-ai/agent-inbox)

@ -26,10 +27,10 @@ public analog to Trail Boss's core idea — a **queue of agents waiting on a hum

 **How Trail Boss differs:** Agent Inbox is bound to the LangGraph runtime and its `interrupt`
 primitive. Trail Boss targets **interactive terminal coding agents** (e.g. Claude Code
-sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and
-delivery is via **tmux `send-keys`** (overlaying a real terminal workflow with no rewrite) or
-the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** semantics and a
-**most-stuck-first** ranking tuned for an operator triaging many live sessions at once.
+sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and the
+operator acts by being **navigated to the live tmux pane** — not by the tool replying through a
+graph runtime. It also adds **skip / re-surface** semantics and an **auto-advance FIFO depletion
+loop** (oldest-stuck first, no priority) tuned for an operator draining many live sessions.

 ## Positioning

@ -41,8 +42,8 @@ the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** se
 ## Concrete reuse

 - **Collector backend** → the hook-native event store from `disler/...observability`
-  (WebSocket + SQLite) is a clean fork point; extend it with the session→pane registry and a
-  `/reply` endpoint.
- **Inbox UX patterns** → `langchain-ai/agent-inbox` for the accept/edit/respond/ignore
-  interaction model, mapped onto Trail Boss's item reasons (`permission`, `plan`, `question`,
-  `idle`).
+  (SQLite + hook ingest) is a clean fork point; extend it with the self-healing session→pane
+  registry and the tmux navigation layer. Drop its WebSocket/browser-UI half — Trail Boss's
+  presentation is tmux (`display-popup`), not a web client.
+- **Inbox UX patterns** → `langchain-ai/agent-inbox` for the accept/respond/ignore interaction
+  model, mapped onto Trail Boss's two stuck reasons (`permission`, `stopped`).