Sophon

Streaming model

A single agent turn is one interaction, identified by an interaction_id. Every event SAP fires during that turn carries the same id, so the iOS client can route deltas to the right bubble.

Event order in a normal text-only turn

1. POST /v1/me/sessions/:id/send         → returns { interaction_id, message_id }
2. message_added (role=user, interaction_id, …)        ← server echo
3. message_added (role=agent, text=" ", interaction_id) ← placeholder bubble
4. message_delta × N (interaction_id, delta="…")
5. message_finalized (interaction_id, text=full, usage)

Bullet 3 is the placeholder bubble — your bridge POSTs it via /v1/bridge/sendMessage so iOS can show "Thinking…" with the right bubble shape immediately. Without it the user stares at a blank chat for the whole LLM round-trip.

Tool calls inside a turn

Tool invocations slot between bullets 4 and 5 (or alongside deltas, if the model streams while the tool runs):

…
4a. task_created  (task_id, kind=exec, status_label="ls -la")
4b. task_progress (task_id, partial_result?)
4c. task_completed | task_failed | task_cancelled
5.  message_finalized

iOS coalesces consecutive task_* events for the same interaction into a single ToolGroupView ("Ran 3 commands") collapsible row. Tap to drill into per-tool args and result. The same data also surfaces inline as AssistantSegment.toolCall / AssistantSegment.toolResult on the agent bubble, so a user opening the chat next month can still see what happened.

Approvals (HITL)

If the agent needs user permission mid-turn it pauses and emits an approval_requested:

4a. task_created            (task_id, kind=exec)
4b. approval_requested      (approval_id, command, severity, …)
    ↓ user taps Allow / Allow always / Deny in iOS
4c. POST /v1/me/approvals/:id  { decision }
4d. approval_resolved       (approval_id, decision)
4e. task_completed | task_failed

The bridge listens for approval.resolved on the bus and unblocks the agent. See Tool calls & approvals for the full HITL contract.

Resume on reconnect

The server keeps a 5-minute / 256-event ring buffer per user. When the iOS app reconnects, it sends Last-Event-ID automatically; the server replays everything past that id, then continues live.

For gaps older than 5 minutes — the user backgrounded the app for half an hour, the LTE connection died on a train — sessions and messages are already durable (/v1/me + /v1/me/sessions/:id/messages on cold launch). What the ring buffer drops is live ephemeral state: the "may I run this command?" approval that fired while the app was killed. The cold-launch snapshot endpoint plugs that gap:

GET /v1/me/snapshot
→ { ts, pending_approvals: [...] }

iOS calls this in refreshSnapshot() between refreshMe() and the SSE attach. Each entry folds through the same handler the live approval_requested event uses, so re-emits dedupe by approval_id. See the Idempotency & resume page for the exact response shape.

Pending-run resume on the iOS side

When you kill the app mid-stream, iOS persists (session_id, bubble_id, run_id) to disk. On cold launch tryResumePendingRuns() walks each record:

  1. Run id presentadapter.resumeRun(sessionId, runId) opens a fresh continuation. A watchdog polls GET /v1/me/sessions/:id/messages every 3 s for up to 24 s; if the agent reply with that interaction_id is already in the DB we resolve as if SSE delivered.
  2. Run id absent — user killed the app inside the /sessions/:id/send round-trip. We just reload history; the server-stored turn surfaces on the next loadHistory().

Either way the user reopens the chat and sees what the agent was doing, even if the SSE stream missed every event in the gap.

Backpressure

Bridges should serialise their writes back to Sophon, so a fast burst of deltas doesn't race the placeholder-bubble create. A simple in-flight queue per (session_id, interaction_id) is enough — the OpenClaw bridge in connectors/openclaw-bridge/src/sophon.ts does exactly this.

If you push deltas faster than the rate-limit bucket allows (delta bucket: 200 capacity, 100/s refill), you'll start getting 429 rate_limited with Retry-After. See Errors & rate limits for the bucket table.