12 KiB
Runtime Adapter Current State
Purpose
Runtime Adapter is the daemon layer responsible for adapting local AI agent CLIs. It converts Open Design's unified generation requests into the actual command-line invocations for each CLI, and converts CLI output into streaming events that the frontend can consume.
The current implementation is concentrated in:
apps/daemon/src/agents.ts: agent definitions, detection, model lists, argument construction, model validation.apps/daemon/src/server.ts:/api/chatrequest orchestration, prompt composition,spawn()subprocesses, SSE forwarding.apps/daemon/src/claude-stream.ts: parsing Claude Code structured JSONL output.apps/daemon/src/json-event-stream.ts: parsing structured JSON/JSONL output from Codex, Gemini, OpenCode, and Cursor Agent.apps/daemon/src/acp.ts: model detection and streaming session orchestration for the ACP JSON-RPC runtime.
Currently Supported Runtimes
AGENT_DEFS in apps/daemon/src/agents.ts defines 8 local runtimes:
| id | Name | CLI | Output format | Model list source |
|---|---|---|---|---|
claude |
Claude Code | claude |
claude-stream-json |
Static fallback |
codex |
Codex CLI | codex |
json-event-stream |
Static fallback |
gemini |
Gemini CLI | gemini |
json-event-stream |
Static fallback |
opencode |
OpenCode | opencode |
json-event-stream |
opencode models + fallback |
hermes |
Hermes | hermes |
acp-json-rpc |
session/new from hermes acp + fallback |
kimi |
Kimi CLI | kimi |
acp-json-rpc |
session/new from kimi acp + fallback |
cursor-agent |
Cursor Agent | cursor-agent |
json-event-stream |
cursor-agent models + fallback |
qwen |
Qwen Code | qwen |
plain |
Static fallback |
Each runtime definition contains:
id/name/bin: used for frontend display and process startup.versionArgs: used to detect the version.fallbackModels: static fallback options for the model selector.listModels: optional model discovery command.fetchModels: optional custom model detection logic, suitable for runtimes such as ACP that require a handshake before the model list is available.reasoningOptions: optional reasoning effort options, currently used by Codex.buildArgs(): converts unified input into the CLI's argv; it can also readruntimeContextat runtime, currently used to explicitly pass execution context such ascwd.streamFormat: tells the daemon how to interpret stdout.
Detection Flow
The detection entry point is detectAgents().
Flow:
- Iterate over
AGENT_DEFS. - Use
resolveOnPath()to locate the CLI binary inPATH. - After locating it, run
versionArgsto get the version. - Generate the model list through
listModels,fetchModels, orfallbackModels, depending on runtime capabilities. - Return the result to the frontend and refresh the runtime's model validation cache.
The detection result includes:
available: whether the CLI is available.path: the actual binary path.version: version string.models: model list used by the frontend model menu.reasoningOptions: reasoning effort menu.streamFormat: output format hint.
Runtime Flow
Actual execution happens in POST /api/chat in apps/daemon/src/server.ts.
Flow:
- The frontend submits
agentId, user message, system prompt, project ID, attachments, model, and reasoning options. - The daemon uses
getAgentDef(agentId)to find the runtime definition. - The daemon creates or locates
.od/projects/<projectId>/as the agent working directory. - The daemon validates uploaded image paths and project attachment paths.
- The daemon combines the system prompt, working directory hint, existing file list, attachment list, and user request into one prompt.
- The daemon prepares additional readable directories:
skills/anddesign-systems/. - The daemon validates the model and reasoning option.
- It calls
def.buildArgs(...)to generate CLI arguments; currently it also passesruntimeContext = { cwd }for CLIs that need an explicit workspace argument. - It starts the local runtime with
spawn(def.bin, args, { cwd }); plain / Claude use read-only stdin, and ACP runtimes use writable stdin. - The daemon forwards runtime output to the frontend through SSE.
Output Stream Handling
There are currently four output formats:
Claude Code: Structured JSONL
Claude Code uses:
claude -p <prompt> --output-format stream-json --verbose --include-partial-messages
The daemon parses stdout through createClaudeStreamHandler() and converts Claude Code JSONL events into UI events:
statustext_deltathinking_deltathinking_starttool_usetool_resultusage
These events are sent to the frontend through the SSE agent event.
Codex / Gemini / OpenCode / Cursor Agent: Structured JSON Event Stream
These four runtimes currently use the unified json-event-stream output format, with stdout parsed by apps/daemon/src/json-event-stream.ts.
Codex
Codex currently uses:
codex exec --json --skip-git-repo-check --full-auto -C <cwd> <prompt>
The current integration uses the lightweight structured path through exec --json. Compared with the original plain-text codex exec, this path adds:
--json: structured event output--skip-git-repo-check: allows running in a temporary working directory--full-auto: non-interactive automatic execution-C <cwd>: explicit working directory
The daemon currently maps:
thread.started→status(initializing)turn.started→status(running)item.completed(agent_message)→text_deltaturn.completed.usage→usage
Gemini
Gemini currently uses:
GEMINI_CLI_TRUST_WORKSPACE=true gemini --output-format stream-json --yolo
The daemon delivers the prompt over stdin rather than argv. It currently maps:
init→status(initializing)message(role=assistant)→text_deltaresult.stats→usage
Gemini may still output some workspace scan warnings on stderr at runtime; the main flow remains unaffected.
OpenCode
OpenCode currently uses:
opencode run --format json --dangerously-skip-permissions <prompt>
When the user selects a model, --model <id> is appended.
The daemon currently maps:
step_start→status(running)text→text_deltatool_use→tool_use- Completed
tool_use.state→tool_result step_finish.part.tokens→usage
Cursor Agent
Cursor Agent currently uses:
cursor-agent --print --output-format stream-json --stream-partial-output --force --trust --workspace <cwd> -p <prompt>
When the user selects a model, --model <id> is appended.
The daemon currently maps:
system(subtype=init)→status(initializing)assistantpartial chunks withtimestamp_ms→text_deltaresult.usage→usage
Cursor outputs both partial assistant chunks and the final aggregated assistant message. The daemon currently prioritizes partial chunks and ignores the final aggregated text after partial chunks have appeared, avoiding duplicate rendering.
Qwen: Plain Text Pass-through
Qwen currently still uses the plain output format.
The daemon directly forwards stdout chunks to the frontend through the SSE stdout event, and stderr chunks through the stderr event.
Hermes / Kimi: ACP JSON-RPC
Hermes uses:
hermes acp --accept-hooks
Kimi uses:
kimi acp
The daemon starts an ACP session over stdio through apps/daemon/src/acp.ts:
initializesession/new- Optional
session/set_model session/prompt
When an ACP runtime actively emits session/request_permission, the daemon prefers approve_for_session, which supports headless automatic approval for CLIs such as Kimi that require approval before tool calls.
The session/new response returns sessionId, models.availableModels, and models.currentModelId. The daemon reuses this information for model detection and runtime status reporting.
It then converts Hermes / Kimi session/update events into frontend-consumable agent events:
agent_thought_chunk→thinking_start/thinking_deltaagent_message_chunk→text_delta- Final usage from
session/prompt→usage
At runtime, two additional status events are added:
- Emit
status(model)aftersession/newreturns the default model. - Emit
status(streaming)when the first text token arrives, includingttftMs.
Model detection also reuses ACP: during detection, the daemon reads models.availableModels and models.currentModelId from the session/new response.
The current Kimi MVP integration directly reuses the Hermes ACP orchestrator. Automatic permission approval has been added to the shared ACP layer. multica also contains Kimi-specific tool title normalization and provider error sniffing; this repository currently keeps a lighter implementation.
Prompt Injection Approach
Local CLIs currently use a unified approach of folding the system prompt into the user message.
The reason is that most local code-agent CLI command-line entry points lack an independent system channel. The daemon composes the following content into a single input:
systemPrompt: base output contract + skill content + design system content.cwdHint: current working directory and file writing rules.filesListBlock: existing file list in the project directory.attachmentHint: attachments uploaded or selected by the user.message: original user request.safeImages: temporary uploaded image paths appended in@pathform.
Claude Code additionally exposes skills/ and design-systems/ through --add-dir, making it easier for the agent to read skill seeds, templates, and design system files.
Safety and Validation
Existing protections include:
- Process startup uses
spawn()argument arrays, avoiding shell string concatenation. - Model IDs are first compared with the model list exposed by the most recent
/api/agentsresponse. - Custom model IDs are validated by
sanitizeCustomModel(), limiting length, character set, and starting character. - Reasoning options must exist in the runtime definition's
reasoningOptions. - Image paths must be located inside the daemon temporary upload directory.
- Attachment paths must be located inside the project working directory.
- Agent working directories are constrained to
.od/projects/<projectId>/. - ACP runtimes have timeout protection for the initialize, session/new, session/set_model, and session/prompt stages.
- ACP runtimes listen for
stdinerrors and proactively clean up detection processes after model detection completes. - When the SSE connection closes, the daemon sends
SIGTERMto the subprocess.
Current Capability Boundaries
The current runtime adapter is a lightweight adaptation layer that already covers discovery, startup, argument construction, model selection, and streaming forwarding.
Main boundaries:
- The adapter is still a declarative object array and has not yet been split into independent adapter classes or directories.
- The capability model is thin and currently mainly exposes models, reasoning, and output format.
- Claude Code, Codex, Gemini, OpenCode, Cursor Agent, Hermes, and Kimi already have structured event parsing.
- Qwen currently still uses plain text pass-through.
- Skill injection mainly relies on prompt composition; only Claude Code uses
--add-dirto support reading external directories. - Hermes currently only integrates the core ACP text session path and has not mapped more
session/updatetypes into unified UI events. - Cancellation is triggered by HTTP connection closure and
SIGTERM; there is no explicit runId / cancel API yet. - Resume, auth state, permission modes, and capability gating have not yet formed a unified interface.
- API fallback belongs to the frontend provider path and is currently outside the daemon runtime adapter layer.
Gap from the Target Architecture
docs/agent-adapters.md describes a more complete target shape: each agent adapter has interfaces such as detect(), capabilities(), run(), cancel(), and resume(), and outputs unified AgentEvents.
The current implementation already has the core outline of the target architecture:
detectAgents()corresponds todetect().AGENT_DEFScorresponds to the adapter registry.buildArgs()corresponds to runtime-specific invocation.streamFormat+claude-stream.ts+json-event-stream.ts+acp.tscorrespond to stream normalization./api/chatcorresponds to unified run orchestration.