About

Hermes Loop is the control room for the Hermes Agent.

Hermes Agent is Nous Research's autonomous agent — open source, MIT-licensed, an agent that lives on your server, remembers what it learns, and gets more capable the longer it runs. It works across Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI; spawns isolated subagents with their own conversations, terminals, and Python RPC; runs on local, Docker, SSH, Singularity, or Modal backends.

Hermes Loop is the operator surface for it: a triage inbox, a persistent job queue, named crews of subagents, governed memory, human approval gates, and a hashed receipt at the end that proves what happened. You don't paste a prompt and hope. You launch a mission and watch it.

Open the desk →System map Launch a mission

What actually makes Hermes Loop different

Agent runs treated like database transactions, not chat sessions.

One design decision drives everything else. A normal chat product treats agent runs like conversations: a transcript, a vibe, "the model said it would do X." Hermes Loop treats them like database transactions: every prompt, every raw response, every tool call, every approval, every memory injection is a discrete row you can query, replay, and prove. The four practical consequences:

Every mission is queryable, not just visible.

Prompts, raw model responses, parsed JSON, tool I/O, approvals, memory usage — all persisted as rows in Prisma. Open prisma studio and inspect any past run, step by step. Replay any AgentRunStep in isolation. The dashboard isn't the source of truth — the database is.

Receipts you can hand to a stakeholder.

Each mission ends with a deterministic SHA-hashed WorkflowReceipt: agent timeline, tool calls, approvals, memory injections, real per-model cost, risk score, integrity hash. Two reviewers can independently verify they're looking at the same artifact. Useful when 'the AI said so' isn't an answer your legal/compliance team accepts.

Approvals are first-class objects, not vibes.

Drafts, paper trades, exports, gated tool calls all halt the mission as ApprovalItem rows. You explicitly approve, reject, or request changes. Nothing leaves the desk autonomously — no rogue email, no real trade, no surprise webhook. The approval queue is the production safety net.

Honest gating instead of silent fallbacks.

If a provider key isn't set, the tool refuses with the exact env var to add. No fake results, no degraded mode that looks fine but isn't. /settings/providers reads process.env live and tells you READY / NEEDS PROVIDER, never PARTIAL with hidden caveats.

Everything else — schema self-correction, retries with exponential backoff, real-cost accounting via live per-model pricing, memory governance, the trust ledger, the evals harness, the worker queue for queued missions — falls out of that one decision. If it's worth doing, it has a row. If it has a row, it's in the receipt.

00b

What this buys you

The capabilities you get because of those four pillars.

Schema-validated outputs

Every agent has a Zod schema. Malformed JSON triggers a self-correction loop — the orchestrator re-prompts the model with the specific Zod error, capped at two retries. Downstream code never has to 'hope' the parse succeeds.

Real per-mission cost

Live per-model rates fetched from the provider, multiplied by actual token counts, recorded on the receipt with priceSource. Not estimated. Not opaque. Switch a HERMES_MODEL_FAST role and watch the next receipt show the new cost.

Memory governance

Operator-approved persistent context. Memory items are diffable, hygiene-flagged (duplicate / conflict / stale), and attributed per-mission via MemoryUsage rows. No implicit retention — the model only sees what you accepted.

Trust ledger

Aggregate reliability roll-up across every run. Per-crew approval rate, tool reliability, risk distribution. Real signal for which crews to trust on which jobs.

Real evals harness

`npm run evals` runs full missions end-to-end against a live Hermes endpoint, asserts on receipt shape (status, deliverable type, integrity hash, cost), and returns exit code = failed cases. Not synthetic, not mocked.

Resumable failures

Mid-mission crashes resume from the failed AgentRunStep, not from scratch. Tool call retries honour Retry-After. Network blips don't burn a run.

Provider-honest UI

/settings/providers tells you what's wired, what isn't, and the exact env var to add. The home page shows what's READY in your environment. No 'might work, might not' fog.

Plain Postgres in prod

Prisma + Postgres. No vendor lock-in, no proprietary serialization, no vector DB you have to migrate. Hash, export, attach to a JIRA ticket — the receipt is just data.

Real safety guarantees

No real trading (paper only, simulatedOnly=true). No automatic emails (drafts only). SSRF guard on browser_qa_audit blocks localhost / private / metadata IPs. Approval-gated tools refuse to run in the sandbox.

Learning loop · autonomous skill creation

After every mission settles cleanly, Hermes Loop runs the JUDGE model on the full timeline and distils up to 3 reusable lessons (checklist / guardrail / playbook / anti-pattern / shortcut) into Skill rows scoped to the crew. Next run of that crew injects the top skills into its system prompt. Operators can review and disable any auto-generated skill on /skills.

Cross-session memory recall

Memory persists across every mission, every crew, every session. The orchestrator queries operator-approved memory before each run and injects the top matches into the Triage Agent and first-step prompts. Pinned memories are always in scope.

MCP (Model Context Protocol) integration

Native MCP client. Configure MCP_SERVERS with a JSON array of remote MCP endpoints; Hermes Loop handshakes via initialize, discovers tools via tools/list, and exposes them through the same tool registry agents already use. JSON-RPC over HTTP, no external SDK.

How this app relates to Hermes Agent

Hermes Agent is the engine. Hermes Loop is the control room.

Every Hermes Agent capability has a corresponding surface in this app that governs, audits, and proves it. Hermes Agent gives you autonomy; Hermes Loop gives you governance and proof.

Hermes Agent (the engine)

Hermes Loop (this app)

Isolated subagents with their own conversations + terminals

Crews — named, ordered, Zod-validated subagent sequences.

Natural-language cron scheduling

/schedules + the worker process running RUN_MISSION jobs.

Persistent learning memory

/memory — operator-approved, audited, diffable, hygiene-checked.

Native tools (web search, browser automation, vision, TTS, multi-model reasoning)

Sandboxed tool layer with approval gates and call logs.

Multi-platform interface (Telegram, Discord, Slack, WhatsApp, Signal, Email, CLI)

A single operator-grade web surface that proves what happened.

Execution backends (local, Docker, SSH, Singularity, Modal)

Backend-agnostic — the worker process just runs missions; deploy where you like.

(autonomy)

Approvals, hashed receipts, trust ledger, evals harness — governance + proof.

Feature parity with Hermes Agent

What this app implements — and what it doesn't (yet).

Hermes Agent (Nous Research) is the upstream autonomous engine. Hermes Loop is one operator surface for it. This board is the source of truth on which engine features are wired up here. Coming-soon items live at the bottom.

17 shipped0 partial0 roadmap

Hermes Agent feature

Status in Hermes Loop

Lives on your server

Shipped

Standard Next.js + worker process; deployable on Railway, Fly, any VM.

Persistent memory that learns over time

Shipped

MemoryItem vault with usage records, diffs, hygiene, and operator approval.

Isolated subagents with their own context

Shipped

Crews of subagents run with their own system prompts, Zod-validated outputs, and per-step persisted context. terminal_exec + python_rpc give each agent sandboxed shell + Python access (one-shot, audit-logged).

Natural-language cron scheduling

Shipped

parseScheduleNL converts plain-English cadences into the structured cadence engine (DAILY / WEEKDAYS / WEEKLY / MONTHLY / ONCE) wired to the worker.

Browser automation tool

Shipped

browser_qa_audit (Playwright + Chromium) — real crawl with screenshots, accessibility checks, console capture. SSRF-guarded.

Web search tool

Shipped

web_search live via Tavily (primary), Brave, or SerpAPI. Provider-gated — refuses cleanly when no key is set.

Vision

Shipped

vision_analyze live via Hermes multimodal models (Gemini 2.5 Flash by default). Returns structured findings, not free text.

Image generation

Shipped

image_generate live. Hermes image-capable models (Gemini, Flux) primary; Fal/Replicate as fallback. Auto-selects cheapest available image model when IMAGE_MODEL is unset.

Text-to-speech

Shipped

text_to_speech live via ElevenLabs (eleven_multilingual_v2). Returns real MP3. OpenAI-compatible /audio/speech available as fallback.

Multi-model reasoning

Shipped

Per-role routing wired: HERMES_MODEL_FAST / STRONG / JUDGE / VISION. Unset roles fall back to default; receipt records fallbackUsed=true.

Multi-platform interface (Discord, Slack, Email, generic webhook, CLI)

Shipped

Live inbound for Discord, Slack, Email (SendGrid/Mailgun/Postmark), generic JSON webhook, and the operator CLI. Outbound approval notifications via Discord webhook or bot token.

Execution backends (local + Docker)

Shipped

Local Node + Docker backend (--network=none --read-only --cap-drop=ALL) wired for terminal_exec + python_rpc. Backend choice is per-call with honest fallback metadata recorded on every receipt.

Container hardening + sandboxed isolation

Shipped

Docker backend gives ephemeral container per call with read-only workspace mount, --network=none, --cap-drop=ALL, no-new-privileges, CPU + memory limits. SSRF guard on browser_qa_audit.

Cross-session memory recall

Shipped

Every mission queries operator-approved memory across all prior sessions; the top matches are injected into the Triage Agent + first-step prompts. lib/memory/search.ts is the engine; swap in vectors later by replacing this one file.

Learning loop · autonomous skill creation

Shipped

After a mission settles, Hermes Loop distils up to 3 durable lessons into Skill rows on the same crew (categories: checklist / guardrail / playbook / anti-pattern / shortcut). Future runs of that crew inject the top skills into their system prompt.

MCP (Model Context Protocol) integration

Shipped

Native MCP client — connects to servers configured via MCP_SERVERS, discovers their tools via tools/list, and exposes them through Hermes Loop's tool registry. JSON-RPC over HTTP, no external SDK required.

Open source

Shipped

Codebase is the source of truth. Hermes Agent itself is MIT-licensed at hermes-agent.nousresearch.com.

Net new — added on top of Hermes Agent

Crews — named, ordered, Zod-validated subagent sequences with reusable templates.
Approval gates on risky outputs (drafts, trades, exports, gated tools).
WorkflowReceipts with deterministic integrity hash + per-model cost accounting.
Trust ledger — aggregate reliability + risk roll-up across all runs.
Evals harness — real-Hermes test suite, exit code = failed cases.
Schema self-correction loop with retries + backoff.
Full audit trail: every prompt, response, tool call, and approval is a database row.
Single operator-grade web surface for governance, instead of multi-platform chat.

Coming soon

Genuine roadmap. None of these are wired today. Listed so the parity board stays honest as the product grows.

Persistent subagent terminals — long-lived PTY sessions per agent with streaming I/O captured to the audit log.
SSH / Singularity / Modal execution backends — remote sandboxed terminal_exec + python_rpc for compute-heavy missions.
Native Telegram + WhatsApp + Signal bots — first-class inbound + outbound, beyond the generic webhook.
Outbound SMTP for approved email drafts + IMAP polling for the inbox.
Slack outbound — DMs and Block Kit responses for richer approval flows in-channel.
Vector memory recall — semantic search on top of the existing substring cross-session recall (Postgres FTS / pgvector).
Cross-mission planner — chain missions by output (mission B reads mission A's deliverable).
Multi-tenant auth (SSO via SAML / OIDC) — replace the single-operator profile in the MVP.
Receipt versioning + approval workflows on receipts — sign-off chains for regulated environments.
Audit log export to SIEM (Splunk / Datadog / Elastic) over signed JSON.
Skill marketplace — shareable crew templates + memory packs across teams.
Real broker integration for trading missions — paper-only today, behind a feature flag.

The problem

A chatbot gives you an answer. A real workflow needs proof.

When the work matters — auditing a website, drafting a refund letter, building a paper-trade thesis, triaging a messy request — a single chat reply isn't enough. You need to know which model ran, which tools it called, which memory it relied on, what went wrong if it failed, and what requires your sign-off before it leaves the desk. None of that is observable when you paste a prompt into a chat box.

Hermes Loop treats every run as a mission: a structured, ordered, validated execution of a crew of agents. Every prompt and every response is persisted. Every tool call is logged and Zod-validated. Every memory injection is recorded. Risky outputs (a draft email, a simulated trade, a report export) wait for you to approve them. When the run is over, a receipt with a deterministic integrity hash captures the whole thing.

The mental model

Six steps from raw request to verifiable receipt.

What actually happens when you run a mission

The orchestrator loop, end to end.

The orchestrator runs each agent of the crew in sequence. For every agent, it:

Builds a system prompt + user prompt from the agent's spec, the mission objective, prior agents' outputs, the selected memories, and the tool catalog this agent is allowed to use.
Calls hermesChat against ${HERMES_API_BASE}/chat/completions with response_format: json_object. Retries 429 / 5xx / network failures with exponential backoff, honouring Retry-After.
Parses the JSON. If the model wrapped it in {final: …}or a single named envelope, the orchestrator unwraps it. If JSON is malformed it re-prompts with the parser error.
Validates the parsed payload against the agent's Zod schema. On failure, it re-prompts the model with a short list of the specific schema errors and asks for a corrected response. Capped at two corrections per step (audit events: agent.json_parse_retry, agent.schema_correction, agent.output_unwrapped, agent.output_repaired).
If the model requested tools and the agent allows them, the tools run server-side. Their output feeds the next round. Tools flagged requiresApproval halt the mission and surface as an ApprovalItem.
Persists an AgentRunStep row with the prompt, the raw response, the parsed JSON, latency, and token counts. Emits an AuditEvent.
Merges the agent's output into the running context for the next agent.

When all agents finish, the crew-specific finalize step composes a Deliverable (BUG_REPORT, DRAFT, TRADE_JOURNAL, etc.) and the appropriate ApprovalItem(s). The mission moves to WAITING_APPROVAL. The receipt is generated, hashed, and rolled into the trust ledger.

What makes it different

A normal chatbot vs Hermes Loop.

Axis

Normal chatbot

Hermes Loop

Output

One answer.

A multi-step mission with every step persisted.

Tools

Usually hidden or ad hoc.

Tool calls logged, schema-validated, sandboxed.

Memory

Implicit context.

Operator-approved memory with usage records and diffs.

Safety

Model decides what to do.

Human approval gates risky outputs. Nothing leaves on its own.

Reliability

Hope the JSON parses.

Schema self-correction loop, retries with backoff, structured failures.

Cost

Opaque or estimated.

Real per-model rates fetched live, recorded on every receipt.

Proof

Transcript.

WorkflowReceipt with integrity hash, timeline, risk score, trust ledger entry.

The data model

Every object you see in the UI is a real database row.

Hermes Loop is Prisma + SQLite locally, Postgres in production. The schema is the source of truth — you can run prisma studio any time and inspect every row.

UserProfile ─┐
             ├─< Mission ─┬─< MissionAgent
             │            ├─< AgentRunStep   (prompt + rawResponse + parsedOutput + tokens + latency)
             │            ├─< ToolCall       (name, input, output, status, agentRunStepId)
             │            ├─< ApprovalItem   (PENDING | APPROVED | REJECTED | NEEDS_CHANGES)
             │            ├─< Deliverable    (REPORT | DRAFT | TRADE_JOURNAL | BUG_REPORT)
             │            ├─< AuditEvent     (every prompt, response, tool, approval, error)
             │            ├─< MemoryUsage    (which MemoryItems were injected, why)
             │            └─< WorkflowReceipt (status, integrityHash, riskLevel, content blob)
             ├─< AgentJob          (TRIAGE_INBOX | RUN_MISSION | RUN_SCHEDULE | RESUME_AFTER_APPROVAL)
             ├─< ScheduledMission  (cron-like cadences with timezone-aware nextRunAt)
             ├─< MemoryItem        (operator-approved persistent context)
             └─< InboxItem         (raw request → triaged → converted into Mission)

Safety guarantees

What the desk will and will not do without you.

No real trading. Paper Trading Desk creates PaperTrade rows with simulatedOnly = true. There is no broker integration.
No automatic emails. Life Admin creates EMAIL_DRAFT approval items. Nothing is sent — drafts only.
No silent fallbacks. When Hermes env vars are configured, network/API failures surface as real errors. The demo responder is only used when the env vars are entirely absent.
SSRF guard on browser_qa_audit. Localhost / private / metadata IPs are blocked by default. The dev-only escape hatch requires both NODE_ENV != production and ALLOW_LOCAL_BROWSER_QA = true.
Auditability is first-class. Every prompt, every response, every tool call, every approval is a database row. The receipt's integrity hash is deterministic and content-addressed.

Who this is for

If your work needs proof, this is for you.

QA leads + product engineers

Audit a site, get a screenshot-backed bug report, hand off the markdown export to the dev team.

Run Bug Hunter →

Personal admin / consumer disputes

Paste an order, a refund refusal, a policy doc. Get a draft message, a follow-up plan, and a record of what was claimed.

Open inbox →

Researchers + analysts

Build a custom crew that scouts, drafts, critiques, and produces a structured report you can prove ran.

Build a crew →

Anyone trialing agentic workflows

Compare runs across models, see token + cost breakdowns at real per-model rates, gate every action with approvals.

Open the desk →

The surfaces

Every page in the app, one line each.

Hermes Agent · the engine

Hermes Agent is the autonomy. Hermes Loop is the control room.

Hermes Agent — built by Nous Research — is the autonomous engine. It lives on your server, holds persistent memory, schedules its own work via natural language cron, and spawns isolated subagents with their own conversations, terminals, and Python RPC. It can run on local, Docker, SSH, Singularity, or Modal backends. Native capabilities include web search, browser automation, vision, image generation, text-to-speech, and multi-model reasoning.

Hermes Loop is the operator-facing layer that turns those capabilities into a workflow you can ship: named crews, a job queue, approval gates, workflow receipts, a trust ledger, an evals harness, schema self-correction, retries with backoff, real per-run cost accounting, and an audit trail of every prompt, response, tool call, and approval.

Hermes Agent is open source under MIT. Learn more at hermes-agent.nousresearch.com.

Vocabulary

The seven words that make up the whole product.

Mission: One run of a crew end-to-end. Has a status (DRAFT → RUNNING → WAITING_APPROVAL → COMPLETED / FAILED) and a complete persistent trace.
Crew: An ordered sequence of specialist agents — each with a system prompt, a Zod output schema, and an allowed tool list.
Job: A background work item picked up by the worker process. Types: TRIAGE_INBOX, RUN_MISSION, RUN_SCHEDULE, RESUME_AFTER_APPROVAL.
Tool: A sandboxed action agents can request. Live: browser_qa_audit (Playwright), terminal_exec, python_rpc, web_search (Tavily), vision_analyze (Gemini), image_generate, text_to_speech (ElevenLabs), web_snapshot, document_extract, price_series_lookup, deadline_create, report_export_draft.
Memory: Operator-approved persistent context. Selected per mission, recorded, diffable. No implicit retention.
Approval: A user decision gate on a risky output (EMAIL_DRAFT, TRADE_SIMULATION, REPORT_EXPORT, FOLLOW_UP, FORM_ACTION, TOOL_CALL).
Receipt: A WorkflowReceipt: status, timeline, integrity hash, real cost, risk level, full content blob. The proof artifact.

What to try first

A 3-minute path that exercises everything.

1. Load the demo workspace. Settings → Load demo. Three pre-completed missions, an active schedule, memories, receipts — instant context. Open settings →
2. Run Bug Hunter on /demo-target. Real Playwright crawl, real findings, real Hermes calls. Launch the demo run →
3. Approve the QA report. Watch the mission move from WAITING_APPROVAL to settled, deliverable accepted.
4. Open the receipt for that mission. See the timeline, the integrity hash, the real per-model cost. Open receipts →
5. Open /trust. Watch your run roll into the aggregate reliability picture. Open trust →

Still have questions?

The /system page is the canonical product map (counts, flow, workflows, vocabulary). Onboarding has a 5-step checklist that forces you through the full loop once.

System map Onboarding checklist Open the desk →