docs: gap-review round 1 — reconcile supporting docs + close spec gaps

Fix 10 gaps found by fresh-eyes review:
- CRITICAL: mechanics doc still documented send-keys-relay/Agent-SDK delivery
  and the Notification hook; rewrote DELIVER to the navigator model, dropped
  Notification throughout
- HIGH: decisions.md + related-work.md still referenced priority "ranking",
  /reply dispatch, send-keys/SDK delivery, and the dead plan/question/idle
  reason taxonomy; aligned to FIFO + navigation + permission/stopped
- collector and daemon stated as one process; pull-not-broadcast presentation
- added trust boundary (loopback-only ingest, single-host trust assumption)
- resolved auto-advance focus-steal hazard (operator-initiated jump) and
  specified skip re-ordering (move to tail + cooldown) to avoid livelock
- downgraded SessionEnd to "not yet probed" (probe only saw SessionStart/Stop)
- dropped forked collector's WebSocket layer; most-stuck → oldest-stuck (FIFO)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-05-25 15:18:56 -04:00
parent ca08cbbead
commit 675c1e531d
4 changed files with 105 additions and 83 deletions

View file

@ -11,7 +11,8 @@ it right. The metaphor maps cleanly onto the mechanism:
- **a steer bogs down or strays** → a `Stop` / `PermissionRequest` hook fires; the collector
flags the session stuck
- **the trail boss rides over and sets it right** → you read the context and give the order
(reply) or wave it on (skip); ranking surfaces the most-stuck first
(reply) or wave it on (skip); the queue surfaces stuck sessions oldest-first (flat FIFO, no
priority ranking)
### Names considered and rejected

View file

@ -99,8 +99,11 @@ interaction sidesteps this entirely.
1. **A blocked-state signal from every session** — Claude Code hooks (`Stop`,
`PermissionRequest`), POSTing their stdin JSON to a local collector.
2. **A central collector + live state store** — tracks every session's status, holds the
`session_id → pane` registry, broadcasts the queue.
2. **A central collector + live state store** — tracks every session's status and holds the
`session_id → pane` registry. The collector **is the daemon** (see Architecture); its HTTP
ingest endpoint is one face of that single process. Presentation reads current state on
demand (the `display-popup` *pulls*; there is no push/broadcast channel), so no WebSocket is
required. An optional status-line "N stuck" segment, if added, polls the daemon.
3. **Context extraction***what* each session is asking. Largely free from the hook payload
(see below); transcript tail for deeper/permission context.
4. **The Trail Boss queue** — a FIFO depletion surface (oldest-stuck first), keyboard-driven.
@ -119,7 +122,8 @@ stuck conditions: a session waiting at a permission prompt is mid-turn and does
| `Stop` | Turn finished; session waiting for the next instruction. | Confirmed firing in interactive and `-p` (probe 2026-05-25); payload carries `last_assistant_message` |
| `PermissionRequest` | Session blocked mid-turn on approval — emits **no** `Stop`, so this is the only signal for the permission case. | Exists; firing/payload **not yet probed** (phase 1) |
| `UserPromptSubmit` | Input submitted → session unstuck → **dequeue**. | Confirmed primitive |
| `SessionStart` / `SessionEnd` | Register / retire the session (and re-assert `session_id → pane`). | Confirmed firing (probe) |
| `SessionStart` | Register the session; re-assert `session_id → pane`. | Confirmed firing (probe) |
| `SessionEnd` | Retire the session. | Exists; firing not yet probed |
| ~~`Notification`~~ | Dropped — `Stop` + `PermissionRequest` cover every stuck case; the dead-letter queue just fills and drains. | Not used |
Both `Stop` and `PermissionRequest` enqueue a plain stuck item with no priority difference.
@ -165,8 +169,8 @@ the host provides every needed primitive (all confirmed present):
`switch-client`, `select-window`, `select-pane`, `link-window`/`unlink-window`,
`join-pane`/`break-pane`, `pipe-pane`, `capture-pane`, `display-popup`.
- **Minimal (recommended start):** route the operator's client to the most-stuck pane —
`switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual
- **Minimal (recommended start):** route the operator's client to the head-of-queue
(oldest-stuck) pane — `switch-client` + `select-window`/`select-pane %id`. This *is* "eliminate manual
tab-switching," with zero relay.
- **Embedded (optional polish):** `link-window` the target session's window into a Trail Boss
view so the queue and live pane are co-visible, then `unlink-window`. Non-destructive (tmux
@ -219,7 +223,7 @@ the live pane. No special edit affordance or `canUseTool` round-trip is required
display-popup (queue overlay) ──select──▶ switch-client /
+ optional status-line "N stuck" select-window/pane
→ you land on the
live most-stuck pane
live head-of-queue pane
```
### Daemon vs. presentation split
@ -288,12 +292,18 @@ loads.
- **Membership:** every stuck session (from `Stop` or `PermissionRequest`). No priority between
reasons; `reason` is display-only. Reconcile removes any that have progressed.
- **Order:** oldest-stuck first (FIFO). The head of the queue is "the current session."
- **Auto-advance:** the operator's focus is navigated to the current session. When that session
resolves — `UserPromptSubmit` fires (you responded) or you `skip` — Trail Boss **loads the
next** stuck session into focus. The operator drains the queue one session at a time without
ever choosing "which next."
- **Skip:** advances to the next without acting; the skipped session stays in the queue and
re-surfaces later in the cycle.
- **Auto-advance (no forced focus-steal):** the operator works the *current* session in its
live pane. When that session resolves — `UserPromptSubmit` fires (you responded) or you
`skip` — the daemon **computes** the next head of queue, but the actual jump is
**operator-initiated** (a keypress / re-invoking the popup), never an automatic
`switch-client`. This matters because responding *is* what fires `UserPromptSubmit`: a forced
jump would teleport you out of the pane the instant you hit Enter. So depletion is "resolve →
next is ready → you press to advance," not a focus-steal. (Whether to offer an opt-in
"auto-jump on resolve" toggle is deferred.)
- **Skip:** advances to the next without acting. The skipped session is **moved to the tail** of
the FIFO (and stamped with a short re-surface cooldown) so depletion always progresses and a
single item can't be re-selected immediately. In a one-item queue, skip lands you on empty;
the item re-surfaces on its next event or after the cooldown.
- **Dequeue:** transcript advances past the last stuck point, or `UserPromptSubmit` fires, or
`SessionEnd`.
- **Empty queue:** nothing stuck → no auto-advance; the operator is free. New stuck sessions
@ -335,8 +345,8 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
3. **Daemon (control plane)** — ingest endpoint behind the normalized stuck/unstuck adapter
contract, SQLite state, self-healing registry, the transcript reconcile loop, FIFO queue.
Runs in its own tmux window.
4. **Navigation**`switch-client`/`select-window`/`select-pane` to route to a pane by id, and
auto-advance to the next stuck session on resolve/skip.
4. **Navigation**`switch-client`/`select-window`/`select-pane` to route to a pane by id;
compute the next head on resolve/skip and jump on operator action (no forced focus-steal).
5. **Presentation**`display-popup` queue overlay + keybinding; optional status-line segment.
6. **Close the loop (walking skeleton)** — stuck pane → loaded into focus → interact → reconcile
dequeues it → next stuck session auto-loads. This end-to-end depletion path is the first
@ -353,6 +363,20 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
- *The queue never lies:* a displayed item reflects current transcript state (reconcile is
authoritative over hook events).
**Trust boundary**
- The collector/daemon binds its ingest endpoint to **loopback only** (`127.0.0.1:4000`) — it is
never exposed off-host. Per the single-operator/single-host non-goal, **all local processes are
trusted**: hook POSTs are unauthenticated and the `session_id` / `$TMUX_PANE` they carry are
taken on trust.
- This is acceptable on a single-user host. On a **multi-user host** it is not: any local process
could POST a forged event with an attacker-chosen `$TMUX_PANE`, causing the daemon to navigate
the operator to — or, if the optional `send-keys` path is enabled, type into — an arbitrary
pane. Mitigations if that ever matters: a unix-domain socket with file-mode restrictions, or a
shared secret in the hook POSTs.
- The optional `send-keys` delivery path is the only way a forged event could inject
*keystrokes*; with it disabled (navigation-only, the default), a forged event can at worst
mis-route the operator's focus — annoying, not destructive.
**Failure modes**
- *Hook POST dropped (collector down/slow):* hook exits 0, event lost → reconcile sweep
recovers it from transcripts.
@ -376,9 +400,11 @@ Still **unverified** (probe before depending on): `PermissionRequest` firing + p
harnesses? See "Layering" above.
2. **`PermissionRequest` specifics** — confirm it fires for the gate types you hit and what its
payload carries (the proposed command, for display). Detection coverage depends on it; phase 1.
3. **Auto-advance trigger** — exactly what counts as "done with the current session" →
load next: `UserPromptSubmit` and explicit `skip` are clear; should manually navigating away
also advance? And is the jump immediate or on a keypress?
3. **Auto-advance residuals** — the trigger and jump model are decided (resolve via
`UserPromptSubmit`/`skip` → next is computed → operator-initiated jump, never a forced
focus-steal; see "Queue & interaction loop"). Residual: should manually navigating away from
the current pane also count as advancing, and is an opt-in "auto-jump on resolve" toggle
worth offering? Decide after the walking skeleton.
4. **Presentation UX**`display-popup` queue + jump vs. a dedicated always-visible window;
decide after the walking skeleton.

View file

@ -19,7 +19,6 @@ the collector is enough:
{ "hooks": [ { "type": "command",
"command": "curl -s -X POST http://localhost:4000/event --data-binary @-" } ] }
],
"Notification": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
"Stop": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
"UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-emit.sh" } ] } ],
"SessionStart": [ { "hooks": [ { "type": "command", "command": "~/.claude/trailboss-register.sh" } ] } ],
@ -32,21 +31,20 @@ Events relevant to "needs a human":
| Event | Meaning for the queue | Confidence |
|-------|----------------------|------------|
| `Stop` | **Turn finished; session idle, waiting for the next prompt.** The reliable idle signal — fires every time a turn completes. Enqueue "ready for next." | Confirmed |
| `PermissionRequest` | Hard block: a tool wants approval. Enqueue allow/deny. | Confirmed exists |
| `Notification` | Claude notifying the user (permission prompt, long-idle nudge). Supplementary; **trigger conditions less documented.** | **(verify, optional)** |
| `Stop` | **Turn finished; session waiting for the next instruction.** Enqueue a stuck item. | Confirmed firing (probe) |
| `PermissionRequest` | Session blocked mid-turn on approval. Emits **no** `Stop`, so it is the only signal for the permission case. Enqueue a stuck item. | Exists; firing/payload not yet probed |
| `SubagentStop` | A subagent finished — usually *not* a human-input point; ignore. | Confirmed |
| `UserPromptSubmit` | Human submitted input → block resolved → **dequeue**. | Confirmed |
| `SessionStart` / `SessionEnd` | Register / retire the session. | Confirmed |
| `SessionStart` | Register the session; capture `$TMUX_PANE`. | Confirmed firing (probe) |
| `SessionEnd` | Retire the session. | Exists; firing not yet probed |
| `PreToolUse` / `PostToolUse` | Activity telemetry (the "running" state), not blocks. | Confirmed |
> **Detection model (settled):** the two load-bearing signals are `Stop` (idle, waiting for the
> next prompt) and `PermissionRequest` (a hard block). `Stop` makes the "wants the next
> instruction" case free — no polling. `Notification` is supplementary; the only open question
> is whether it surfaces anything those two miss (e.g., a long-idle nudge), and if not it can
> be dropped. Optional probe: log `Notification` alongside `Stop`/`PermissionRequest` across
> (a) a permission prompt, (b) plan-mode approval, (c) a clarifying question, (d) a finished
> turn at an empty prompt, and see if it ever adds signal.
> **Detection model (settled):** the two enqueue triggers are `Stop` (turn finished, waiting)
> and `PermissionRequest` (blocked mid-turn). **Both are required** — a permission-blocked
> session is mid-turn and emits no `Stop`, so without `PermissionRequest` it would never be
> detected. They are treated identically (a flat stuck item; `reason` is display-only, never a
> priority). `Notification` was evaluated and **dropped**`Stop` + `PermissionRequest` cover
> every stuck case. See `../plan/plan.md` ("Detection model" and the resolved-questions list).
---
@ -108,44 +106,41 @@ prompt. Cheap; poll on demand.
---
## 4. DELIVER — get the reply back into the exact session
## 4. DELIVER — route the operator to the live session (navigation, not relay)
### Substrate A — tmux `send-keys` (overlays a terminal workflow)
Trail Boss does **not** inject answers. It navigates the operator to the live pane, where they
interact with the real prompt directly (see `../plan/plan.md`, "Navigator, not relay" and the
delivery decision in `../notes/decisions.md`). The relevant primitives, all tmux-server-global
so they work from outside tmux:
```bash
# free-text reply (a clarifying answer or the next instruction)
tmux send-keys -t <pane> -l 'shard by tenant; tenants are the hard isolation boundary'
tmux send-keys -t <pane> Enter
# permission menu choice (send the option key the prompt expects)
tmux send-keys -t <pane> '1' # or 'y' / arrow+Enter, depending on the prompt
# bring the operator's client to the stuck pane (pane ids like %446 are global)
tmux switch-client -t "$(tmux display -p -t %446 '#{session_name}')"
tmux select-window -t %446
tmux select-pane -t %446
# optional co-display: link the target window into a Trail Boss view, then unlink
tmux link-window -s <src-session>:<window> -t trailboss: ; tmux unlink-window -t trailboss:<n>
```
- `-l` sends the argument literally; send `Enter` separately.
- Requires the `session_id → pane` mapping from the `SessionStart` hook.
- **Verify** multi-line/paste fidelity and how each prompt type consumes keystrokes.
- Works regardless of how the session was launched — it stays a normal interactive session.
- Primary delivery is **navigation** — the operator types into the genuine CLI, so there is no
keystroke-fidelity problem and "edit before allow" is native.
- **Secondary (optional):** `tmux send-keys -t %446 -l '<text>'` then `send-keys -t %446 Enter`
for plain-text submission (basic submission confirmed in the probe). Not the primary path; the
daemon never sends *synthesized* input — only human-authored text, and only if this path is
enabled.
### Substrate B — Agent SDK (the clean rewrite)
### Rejected delivery alternatives
If sessions run under the Python/TypeScript Agent SDK:
- **`canUseTool` callback** (`can_use_tool` in Python) — fires for any tool needing approval and
for clarifying questions. It can **await indefinitely** while the orchestrator collects your
remote decision, then returns `{ behavior: "allow" | "deny", updatedInput?, message? }`
including a *modified* tool input. The cleanest "answer from elsewhere" hook: execution pauses
until you respond.
- **Streaming input** — the SDK accepts an async-iterable prompt stream, so follow-up turns are
delivered programmatically over a long-lived channel.
- **Resume**`resume: <session_id>` to continue a specific session.
### Why headless `claude -p` is NOT a delivery path
`claude -p` (print/headless) with `--output-format stream-json` / `--input-format stream-json`
is **one-shot**: it cannot stream additional user messages into an already-running session, and
interactive permission prompting is unavailable (you must pre-authorize with `--allowedTools`
or wire `--permission-prompt-tool <mcp_tool>`, whose payload schema is undocumented —
**(verify)**). Use the SDK, not `-p`, for Substrate B.
- **Resume-to-deliver** (`claude --resume <id>` in a second process) does **not** reach the
original live pane — a live interactive CLI holds in-memory state and does not re-read its
transcript; concurrent attach risks divergence. `--fork-session` confirms `--resume` reuses
the session. Only viable in a no-resident-process model, which is rejected for v1.
- **Agent SDK `canUseTool` + streaming input** would allow programmatic permission gating with
`updatedInput`, but requires running sessions under the SDK instead of the terminal —
deferred; the tmux-navigator model fits the existing workflow.
- **`claude --remote-control`** routes to the claude.ai / desktop / mobile surface, not a local
channel — useless for a same-host tool.
- **Headless `claude -p`** is one-shot and cannot stream input into a running session.
---
@ -153,13 +148,12 @@ or wire `--permission-prompt-tool <mcp_tool>`, whose payload schema is undocumen
| Need | Primitive | Identifier / flag | Confidence |
|------|-----------|-------------------|------------|
| Detect idle / waiting for next prompt | `Stop` hook | stdin JSON | confirmed (primary idle signal) |
| Detect permission block | `PermissionRequest` hook | stdin JSON | confirmed |
| Detect long-idle nudge (optional) | `Notification` hook | stdin JSON | **(verify, optional)** |
| Detect waiting for next instruction | `Stop` hook | stdin JSON | confirmed firing (probe) |
| Detect permission block | `PermissionRequest` hook | stdin JSON | exists; not yet probed |
| Detect resolved block | `UserPromptSubmit` hook | `session_id` | confirmed |
| Correlate to session/repo | any hook payload | `session_id`, `cwd`, `transcript_path` | confirmed |
| Correlate to tmux pane | `SessionStart` hook | `$TMUX_PANE` (capture in script) | confirmed (you wire it) |
| Read the question | transcript JSONL / `capture-pane` | `transcript_path` / pane | confirmed |
| Deliver reply (overlay) | tmux `send-keys -t <pane>` | pane id | confirmed |
| Deliver reply (rewrite) | Agent SDK `canUseTool` + streaming input | `session_id` | confirmed (SDK only) |
| Deliver reply (headless) | — | not viable (`-p` is one-shot) | confirmed limitation |
| Correlate to session/repo | any hook payload + env | `session_id`, `cwd`, `transcript_path`, `CLAUDE_CODE_SESSION_ID` | confirmed (probe) |
| Correlate to tmux pane | any emit hook | `$TMUX_PANE` (in hook env) | confirmed (probe) |
| Read the question | `Stop` payload `last_assistant_message` / transcript / `capture-pane` | payload / `transcript_path` / pane | confirmed (probe) |
| Deliver (primary) | tmux navigation (`switch-client`/`select-window`/`select-pane`) | pane id | confirmed primitives |
| Deliver (secondary, optional) | tmux `send-keys -t <pane>` (human-authored text only) | pane id | basic submission confirmed |
| Rejected: resume / SDK / remote-control / `-p` | — | — | see "Rejected" above |

View file

@ -13,10 +13,11 @@ activity (sessions, tool calls, errors) over WebSocket. It proves the **detectio
layer** Trail Boss needs: hooks → collector → live UI, keyed by `session_id`.
**How Trail Boss differs:** it is *observability*, not *action*. It shows *that* a session is
waiting; Trail Boss adds the missing half — surfacing the blocked session as an actionable
queue item and **delivering your reply back into the exact session**. Its collector is a strong
starting point to fork; Trail Boss extends the read-only event store with a session→pane
registry and a `/reply` dispatch path.
waiting; Trail Boss adds the missing half — surfacing the blocked session and **navigating the
operator to the live pane** to act on it (it never injects input; see "Navigator, not relay" in
the plan). Its collector is a strong starting point to fork; Trail Boss extends the read-only
event store with a self-healing session→pane registry and a tmux navigation layer. It reuses the
SQLite event store, not the WebSocket/browser-UI layer (presentation is tmux, not a web app).
### [`langchain-ai/agent-inbox`](https://github.com/langchain-ai/agent-inbox)
@ -26,10 +27,10 @@ public analog to Trail Boss's core idea — a **queue of agents waiting on a hum
**How Trail Boss differs:** Agent Inbox is bound to the LangGraph runtime and its `interrupt`
primitive. Trail Boss targets **interactive terminal coding agents** (e.g. Claude Code
sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and
delivery is via **tmux `send-keys`** (overlaying a real terminal workflow with no rewrite) or
the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** semantics and a
**most-stuck-first** ranking tuned for an operator triaging many live sessions at once.
sessions): detection comes from Claude Code **hooks** rather than framework interrupts, and the
operator acts by being **navigated to the live tmux pane** — not by the tool replying through a
graph runtime. It also adds **skip / re-surface** semantics and an **auto-advance FIFO depletion
loop** (oldest-stuck first, no priority) tuned for an operator draining many live sessions.
## Positioning
@ -41,8 +42,8 @@ the **Agent SDK** — not a graph runtime. It also adds **skip / re-surface** se
## Concrete reuse
- **Collector backend** → the hook-native event store from `disler/...observability`
(WebSocket + SQLite) is a clean fork point; extend it with the session→pane registry and a
`/reply` endpoint.
- **Inbox UX patterns**`langchain-ai/agent-inbox` for the accept/edit/respond/ignore
interaction model, mapped onto Trail Boss's item reasons (`permission`, `plan`, `question`,
`idle`).
(SQLite + hook ingest) is a clean fork point; extend it with the self-healing session→pane
registry and the tmux navigation layer. Drop its WebSocket/browser-UI half — Trail Boss's
presentation is tmux (`display-popup`), not a web client.
- **Inbox UX patterns**`langchain-ai/agent-inbox` for the accept/respond/ignore interaction
model, mapped onto Trail Boss's two stuck reasons (`permission`, `stopped`).