Zakaria/open-design

Fork 0

Zakaria a46764fb1b

ci / Validate workspace (push) Has been cancelled

Details

landing-page-ci / Validate landing page (push) Has been cancelled

Details

landing-page-deploy / Deploy landing page (push) Has been cancelled

Details

github-metrics / Generate repository metrics SVG (push) Has been cancelled

Details

first-commit

2026-05-04 14:58:14 -04:00

12 KiB

Raw Permalink Blame History

Runtime Adapter Current State

Purpose

Runtime Adapter is the daemon layer responsible for adapting local AI agent CLIs. It converts Open Design's unified generation requests into the actual command-line invocations for each CLI, and converts CLI output into streaming events that the frontend can consume.

The current implementation is concentrated in:

apps/daemon/src/agents.ts: agent definitions, detection, model lists, argument construction, model validation.
apps/daemon/src/server.ts: /api/chat request orchestration, prompt composition, spawn() subprocesses, SSE forwarding.
apps/daemon/src/claude-stream.ts: parsing Claude Code structured JSONL output.
apps/daemon/src/json-event-stream.ts: parsing structured JSON/JSONL output from Codex, Gemini, OpenCode, and Cursor Agent.
apps/daemon/src/acp.ts: model detection and streaming session orchestration for the ACP JSON-RPC runtime.

Currently Supported Runtimes

AGENT_DEFS in apps/daemon/src/agents.ts defines 8 local runtimes:

id	Name	CLI	Output format	Model list source
`claude`	Claude Code	`claude`	`claude-stream-json`	Static fallback
`codex`	Codex CLI	`codex`	`json-event-stream`	Static fallback
`gemini`	Gemini CLI	`gemini`	`json-event-stream`	Static fallback
`opencode`	OpenCode	`opencode`	`json-event-stream`	`opencode models` + fallback
`hermes`	Hermes	`hermes`	`acp-json-rpc`	`session/new` from `hermes acp` + fallback
`kimi`	Kimi CLI	`kimi`	`acp-json-rpc`	`session/new` from `kimi acp` + fallback
`cursor-agent`	Cursor Agent	`cursor-agent`	`json-event-stream`	`cursor-agent models` + fallback
`qwen`	Qwen Code	`qwen`	`plain`	Static fallback

Each runtime definition contains:

id / name / bin: used for frontend display and process startup.
versionArgs: used to detect the version.
fallbackModels: static fallback options for the model selector.
listModels: optional model discovery command.
fetchModels: optional custom model detection logic, suitable for runtimes such as ACP that require a handshake before the model list is available.
reasoningOptions: optional reasoning effort options, currently used by Codex.
buildArgs(): converts unified input into the CLI's argv; it can also read runtimeContext at runtime, currently used to explicitly pass execution context such as cwd.
streamFormat: tells the daemon how to interpret stdout.

Detection Flow

The detection entry point is detectAgents().

Flow:

Iterate over AGENT_DEFS.
Use resolveOnPath() to locate the CLI binary in PATH.
After locating it, run versionArgs to get the version.
Generate the model list through listModels, fetchModels, or fallbackModels, depending on runtime capabilities.
Return the result to the frontend and refresh the runtime's model validation cache.

The detection result includes:

available: whether the CLI is available.
path: the actual binary path.
version: version string.
models: model list used by the frontend model menu.
reasoningOptions: reasoning effort menu.
streamFormat: output format hint.

Runtime Flow

Actual execution happens in POST /api/chat in apps/daemon/src/server.ts.

Flow:

The frontend submits agentId, user message, system prompt, project ID, attachments, model, and reasoning options.
The daemon uses getAgentDef(agentId) to find the runtime definition.
The daemon creates or locates .od/projects/<projectId>/ as the agent working directory.
The daemon validates uploaded image paths and project attachment paths.
The daemon combines the system prompt, working directory hint, existing file list, attachment list, and user request into one prompt.
The daemon prepares additional readable directories: skills/ and design-systems/.
The daemon validates the model and reasoning option.
It calls def.buildArgs(...) to generate CLI arguments; currently it also passes runtimeContext = { cwd } for CLIs that need an explicit workspace argument.
It starts the local runtime with spawn(def.bin, args, { cwd }); plain / Claude use read-only stdin, and ACP runtimes use writable stdin.
The daemon forwards runtime output to the frontend through SSE.

Output Stream Handling

There are currently four output formats:

Claude Code: Structured JSONL

Claude Code uses:

claude -p <prompt> --output-format stream-json --verbose --include-partial-messages

The daemon parses stdout through createClaudeStreamHandler() and converts Claude Code JSONL events into UI events:

status
text_delta
thinking_delta
thinking_start
tool_use
tool_result
usage

These events are sent to the frontend through the SSE agent event.

Codex / Gemini / OpenCode / Cursor Agent: Structured JSON Event Stream

These four runtimes currently use the unified json-event-stream output format, with stdout parsed by apps/daemon/src/json-event-stream.ts.

Codex

Codex currently uses:

codex exec --json --skip-git-repo-check --full-auto -C <cwd> <prompt>

The current integration uses the lightweight structured path through exec --json. Compared with the original plain-text codex exec, this path adds:

--json: structured event output
--skip-git-repo-check: allows running in a temporary working directory
--full-auto: non-interactive automatic execution
-C <cwd>: explicit working directory

The daemon currently maps:

thread.started → status(initializing)
turn.started → status(running)
item.completed(agent_message) → text_delta
turn.completed.usage → usage

Gemini

Gemini currently uses:

GEMINI_CLI_TRUST_WORKSPACE=true gemini --output-format stream-json --yolo

The daemon delivers the prompt over stdin rather than argv. It currently maps:

init → status(initializing)
message(role=assistant) → text_delta
result.stats → usage

Gemini may still output some workspace scan warnings on stderr at runtime; the main flow remains unaffected.

OpenCode

OpenCode currently uses:

opencode run --format json --dangerously-skip-permissions <prompt>

When the user selects a model, --model <id> is appended.

The daemon currently maps:

step_start → status(running)
text → text_delta
tool_use → tool_use
Completed tool_use.state → tool_result
step_finish.part.tokens → usage

Cursor Agent

Cursor Agent currently uses:

cursor-agent --print --output-format stream-json --stream-partial-output --force --trust --workspace <cwd> -p <prompt>

When the user selects a model, --model <id> is appended.

The daemon currently maps:

system(subtype=init) → status(initializing)
assistant partial chunks with timestamp_ms → text_delta
result.usage → usage

Cursor outputs both partial assistant chunks and the final aggregated assistant message. The daemon currently prioritizes partial chunks and ignores the final aggregated text after partial chunks have appeared, avoiding duplicate rendering.

Qwen: Plain Text Pass-through

Qwen currently still uses the plain output format.

The daemon directly forwards stdout chunks to the frontend through the SSE stdout event, and stderr chunks through the stderr event.

Hermes / Kimi: ACP JSON-RPC

Hermes uses:

hermes acp --accept-hooks

Kimi uses:

kimi acp

The daemon starts an ACP session over stdio through apps/daemon/src/acp.ts:

initialize
session/new
Optional session/set_model
session/prompt

When an ACP runtime actively emits session/request_permission, the daemon prefers approve_for_session, which supports headless automatic approval for CLIs such as Kimi that require approval before tool calls.

The session/new response returns sessionId, models.availableModels, and models.currentModelId. The daemon reuses this information for model detection and runtime status reporting.

It then converts Hermes / Kimi session/update events into frontend-consumable agent events:

agent_thought_chunk → thinking_start / thinking_delta
agent_message_chunk → text_delta
Final usage from session/prompt → usage

At runtime, two additional status events are added:

Emit status(model) after session/new returns the default model.
Emit status(streaming) when the first text token arrives, including ttftMs.

Model detection also reuses ACP: during detection, the daemon reads models.availableModels and models.currentModelId from the session/new response.

The current Kimi MVP integration directly reuses the Hermes ACP orchestrator. Automatic permission approval has been added to the shared ACP layer. multica also contains Kimi-specific tool title normalization and provider error sniffing; this repository currently keeps a lighter implementation.

Prompt Injection Approach

Local CLIs currently use a unified approach of folding the system prompt into the user message.

The reason is that most local code-agent CLI command-line entry points lack an independent system channel. The daemon composes the following content into a single input:

systemPrompt: base output contract + skill content + design system content.
cwdHint: current working directory and file writing rules.
filesListBlock: existing file list in the project directory.
attachmentHint: attachments uploaded or selected by the user.
message: original user request.
safeImages: temporary uploaded image paths appended in @path form.

Claude Code additionally exposes skills/ and design-systems/ through --add-dir, making it easier for the agent to read skill seeds, templates, and design system files.

Safety and Validation

Existing protections include:

Process startup uses spawn() argument arrays, avoiding shell string concatenation.
Model IDs are first compared with the model list exposed by the most recent /api/agents response.
Custom model IDs are validated by sanitizeCustomModel(), limiting length, character set, and starting character.
Reasoning options must exist in the runtime definition's reasoningOptions.
Image paths must be located inside the daemon temporary upload directory.
Attachment paths must be located inside the project working directory.
Agent working directories are constrained to .od/projects/<projectId>/.
ACP runtimes have timeout protection for the initialize, session/new, session/set_model, and session/prompt stages.
ACP runtimes listen for stdin errors and proactively clean up detection processes after model detection completes.
When the SSE connection closes, the daemon sends SIGTERM to the subprocess.

Current Capability Boundaries

The current runtime adapter is a lightweight adaptation layer that already covers discovery, startup, argument construction, model selection, and streaming forwarding.

Main boundaries:

The adapter is still a declarative object array and has not yet been split into independent adapter classes or directories.
The capability model is thin and currently mainly exposes models, reasoning, and output format.
Claude Code, Codex, Gemini, OpenCode, Cursor Agent, Hermes, and Kimi already have structured event parsing.
Qwen currently still uses plain text pass-through.
Skill injection mainly relies on prompt composition; only Claude Code uses --add-dir to support reading external directories.
Hermes currently only integrates the core ACP text session path and has not mapped more session/update types into unified UI events.
Cancellation is triggered by HTTP connection closure and SIGTERM; there is no explicit runId / cancel API yet.
Resume, auth state, permission modes, and capability gating have not yet formed a unified interface.
API fallback belongs to the frontend provider path and is currently outside the daemon runtime adapter layer.

Gap from the Target Architecture

docs/agent-adapters.md describes a more complete target shape: each agent adapter has interfaces such as detect(), capabilities(), run(), cancel(), and resume(), and outputs unified AgentEvents.

The current implementation already has the core outline of the target architecture:

detectAgents() corresponds to detect().
AGENT_DEFS corresponds to the adapter registry.
buildArgs() corresponds to runtime-specific invocation.
streamFormat + claude-stream.ts + json-event-stream.ts + acp.ts correspond to stream normalization.
/api/chat corresponds to unified run orchestration.

12 KiB Raw Permalink Blame History

Runtime Adapter Current State

Purpose

Currently Supported Runtimes

Detection Flow

Runtime Flow

Output Stream Handling

Claude Code: Structured JSONL

Codex / Gemini / OpenCode / Cursor Agent: Structured JSON Event Stream

Codex

Gemini

OpenCode

Cursor Agent

Qwen: Plain Text Pass-through

Hermes / Kimi: ACP JSON-RPC

Prompt Injection Approach

Safety and Validation

Current Capability Boundaries

Gap from the Target Architecture

12 KiB

Raw Permalink Blame History