Build a Long-Horizon Agent
Build a Long-Horizon Agent
Run multi-hour agent loops with cost ceilings, resumable state, and voice input
Build a Long-Horizon Agent
Run multi-hour agent loops with cost ceilings, resumable state, and voice input
This cookbook assumes you have an OpenRouter API key and are using the Agent
SDK (@openrouter/agent). If you are starting from scratch, read the
Agent SDK overview and the
callModel reference first.
Goal: Run an agent that can keep working for hours, not seconds — research
projects, multi-stage migrations, voice-driven assistants, or background jobs
that span days. The same callModel loop works for all of them once you wire
up four primitives.
Outcome: A long-horizon agent that:
[DONE] sentinel.You can hand this page to your coding agent as the implementation brief. Adapt the storage, ceilings, and surface (CLI, API, queue worker) to your app rather than scaffold a separate project.
OPENROUTER_API_KEY@openrouter/agent installedLong-horizon agents must terminate. Combine multiple stop conditions so the
loop ends as soon as the first one fires. The most useful for long runs are
maxCost, stepCountIs, and maxTokensUsed.
See the Stop Conditions reference
for the full list (stepCountIs, hasToolCall, maxTokensUsed, maxCost,
finishReasonIs) and how to compose custom predicates.
Long-horizon runs spend real credits. Always set both a step ceiling and a cost ceiling before you start a multi-hour run, and start small while you are iterating.
A multi-hour run must survive restarts, deploys, and human approvals.
callModel accepts a StateAccessor that loads and saves
ConversationState between steps. Back it with whatever storage your app
already uses.
To resume after a crash, deploy, or human review, call callModel again with
the same StateAccessor. Pass input: [] to signal “no new user turn —
continue from saved state”; the SDK loads the checkpoint and keeps going.
For production, swap the file accessor for one backed by Postgres, Redis, or an object store. See Tool Approval & State for the full StateAccessor and resumption contract.
A run that lasts an hour should not block your UI for an hour. callModel
returns a result object with several streams you can consume independently:
result.getTextStream() — token deltas for the user-facing response.result.getToolCallsStream() — tool calls as they complete.result.getFullResponsesStream() — the full event stream, including tool
preliminary results.result.getResponse() — the final, fully-resolved response with usage data.See the callModel API reference for every stream method and event type.
Wire publishToDashboard to whatever transport you already use — Server-Sent
Events, WebSockets, a database table, or a pubsub channel.
A single pass through callModel often leaves gaps — unverified citations,
missing edge cases, or stale data. Wrap the run in an outer self-ask loop:
research, adversarial review, repeat until the agent emits a [DONE]
sentinel. Each iteration appends a new user turn to the persisted
StateAccessor, so the agent builds on its prior work instead of starting
over.
The [DONE] sentinel is intentionally cheap: any model can produce it, and a
plain String.includes check keeps the control flow obvious. Swap the review
prompt or the reviewer model (for example a faster
~anthropic/claude-sonnet-latest critiquing an Opus researcher) without
changing the loop. Three layers of ceilings keep cost bounded:
SELF_ASK_MAX_ITERATIONS caps the number of review rounds, and each round
inherits its own stepCountIs + maxCost budget.
Pair this with the state accessor from step 2 so the loop survives crashes
mid-review. On resume, re-enter the loop from the saved state and continue
reviewing.
Drive the same agent loop from a voice memo, phone call, or push-to-talk app.
OpenRouter exposes a dedicated
/api/v1/audio/transcriptions
endpoint with a single STT model parameter. Hand the transcript to
callModel exactly like a text prompt.
For a streaming microphone, capture audio chunks on the client, send them to
your server, and call createTranscription once silence is detected. Use the
STT cookbook for the full request and
response shape.
For voice-out, pipe the agent’s reply through
/api/v1/audio/speech and write the
resulting bytes to a file or stream them to the caller.
Long-horizon jobs usually run somewhere the user is not watching. Notify them
when the run terminates — by webhook, email, Slack message, or whatever your
stack uses. Trigger the notification once getResponse() resolves so the
agent has fully completed and ceilings have been honored.
For agents that pause mid-run (for example, human-in-the-loop approvals), see Add Human-in-the-Loop Controls.
A correct long-horizon implementation should pass all of the following:
maxCost (for example, maxCost(0.10)) returns from
callModel once the ceiling is hit, even if the agent has more work
queued.callModel invocation with
the same StateAccessor resumes from the saved ConversationState. The
message history grows rather than starting over.getToolCallsStream() and getTextStream() yield events while the agent
is still running, not only at the end.sdk.stt.createTranscription returns the
expected text, and feeding that text into callModel produces a response
that references the spoken request.getResponse() resolves.