Hermes Loop is the control room for the Hermes Agent.
Hermes Agent is Nous Research's autonomous agent — open source, MIT-licensed, an agent that lives on your server, remembers what it learns, and gets more capable the longer it runs. It works across Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI; spawns isolated subagents with their own conversations, terminals, and Python RPC; runs on local, Docker, SSH, Singularity, or Modal backends.
Hermes Loop is the operator surface for it: a triage inbox, a persistent job queue, named crews of subagents, governed memory, human approval gates, and a hashed receipt at the end that proves what happened. You don't paste a prompt and hope. You launch a mission and watch it.
Agent runs treated like database transactions, not chat sessions.
One design decision drives everything else. A normal chat product treats agent runs like conversations: a transcript, a vibe, "the model said it would do X." Hermes Loop treats them like database transactions: every prompt, every raw response, every tool call, every approval, every memory injection is a discrete row you can query, replay, and prove. The four practical consequences:
Prompts, raw model responses, parsed JSON, tool I/O, approvals, memory usage — all persisted as rows in Prisma. Open prisma studio and inspect any past run, step by step. Replay any AgentRunStep in isolation. The dashboard isn't the source of truth — the database is.
Each mission ends with a deterministic SHA-hashed WorkflowReceipt: agent timeline, tool calls, approvals, memory injections, real per-model cost, risk score, integrity hash. Two reviewers can independently verify they're looking at the same artifact. Useful when 'the AI said so' isn't an answer your legal/compliance team accepts.
Drafts, paper trades, exports, gated tool calls all halt the mission as ApprovalItem rows. You explicitly approve, reject, or request changes. Nothing leaves the desk autonomously — no rogue email, no real trade, no surprise webhook. The approval queue is the production safety net.
If a provider key isn't set, the tool refuses with the exact env var to add. No fake results, no degraded mode that looks fine but isn't. /settings/providers reads process.env live and tells you READY / NEEDS PROVIDER, never PARTIAL with hidden caveats.
Everything else — schema self-correction, retries with exponential backoff, real-cost accounting via live per-model pricing, memory governance, the trust ledger, the evals harness, the worker queue for queued missions — falls out of that one decision. If it's worth doing, it has a row. If it has a row, it's in the receipt.
The capabilities you get because of those four pillars.
Every agent has a Zod schema. Malformed JSON triggers a self-correction loop — the orchestrator re-prompts the model with the specific Zod error, capped at two retries. Downstream code never has to 'hope' the parse succeeds.
Live per-model rates fetched from the provider, multiplied by actual token counts, recorded on the receipt with priceSource. Not estimated. Not opaque. Switch a HERMES_MODEL_FAST role and watch the next receipt show the new cost.
Operator-approved persistent context. Memory items are diffable, hygiene-flagged (duplicate / conflict / stale), and attributed per-mission via MemoryUsage rows. No implicit retention — the model only sees what you accepted.
Aggregate reliability roll-up across every run. Per-crew approval rate, tool reliability, risk distribution. Real signal for which crews to trust on which jobs.
`npm run evals` runs full missions end-to-end against a live Hermes endpoint, asserts on receipt shape (status, deliverable type, integrity hash, cost), and returns exit code = failed cases. Not synthetic, not mocked.
Mid-mission crashes resume from the failed AgentRunStep, not from scratch. Tool call retries honour Retry-After. Network blips don't burn a run.
/settings/providers tells you what's wired, what isn't, and the exact env var to add. The home page shows what's READY in your environment. No 'might work, might not' fog.
Prisma + Postgres. No vendor lock-in, no proprietary serialization, no vector DB you have to migrate. Hash, export, attach to a JIRA ticket — the receipt is just data.
No real trading (paper only, simulatedOnly=true). No automatic emails (drafts only). SSRF guard on browser_qa_audit blocks localhost / private / metadata IPs. Approval-gated tools refuse to run in the sandbox.
After every mission settles cleanly, Hermes Loop runs the JUDGE model on the full timeline and distils up to 3 reusable lessons (checklist / guardrail / playbook / anti-pattern / shortcut) into Skill rows scoped to the crew. Next run of that crew injects the top skills into its system prompt. Operators can review and disable any auto-generated skill on /skills.
Memory persists across every mission, every crew, every session. The orchestrator queries operator-approved memory before each run and injects the top matches into the Triage Agent and first-step prompts. Pinned memories are always in scope.
Native MCP client. Configure MCP_SERVERS with a JSON array of remote MCP endpoints; Hermes Loop handshakes via initialize, discovers tools via tools/list, and exposes them through the same tool registry agents already use. JSON-RPC over HTTP, no external SDK.
Hermes Agent is the engine. Hermes Loop is the control room.
Every Hermes Agent capability has a corresponding surface in this app that governs, audits, and proves it. Hermes Agent gives you autonomy; Hermes Loop gives you governance and proof.
What this app implements — and what it doesn't (yet).
Hermes Agent (Nous Research) is the upstream autonomous engine. Hermes Loop is one operator surface for it. This board is the source of truth on which engine features are wired up here. Coming-soon items live at the bottom.
- Crews — named, ordered, Zod-validated subagent sequences with reusable templates.
- Approval gates on risky outputs (drafts, trades, exports, gated tools).
- WorkflowReceipts with deterministic integrity hash + per-model cost accounting.
- Trust ledger — aggregate reliability + risk roll-up across all runs.
- Evals harness — real-Hermes test suite, exit code = failed cases.
- Schema self-correction loop with retries + backoff.
- Full audit trail: every prompt, response, tool call, and approval is a database row.
- Single operator-grade web surface for governance, instead of multi-platform chat.
Genuine roadmap. None of these are wired today. Listed so the parity board stays honest as the product grows.
- Persistent subagent terminals — long-lived PTY sessions per agent with streaming I/O captured to the audit log.
- SSH / Singularity / Modal execution backends — remote sandboxed terminal_exec + python_rpc for compute-heavy missions.
- Native Telegram + WhatsApp + Signal bots — first-class inbound + outbound, beyond the generic webhook.
- Outbound SMTP for approved email drafts + IMAP polling for the inbox.
- Slack outbound — DMs and Block Kit responses for richer approval flows in-channel.
- Vector memory recall — semantic search on top of the existing substring cross-session recall (Postgres FTS / pgvector).
- Cross-mission planner — chain missions by output (mission B reads mission A's deliverable).
- Multi-tenant auth (SSO via SAML / OIDC) — replace the single-operator profile in the MVP.
- Receipt versioning + approval workflows on receipts — sign-off chains for regulated environments.
- Audit log export to SIEM (Splunk / Datadog / Elastic) over signed JSON.
- Skill marketplace — shareable crew templates + memory packs across teams.
- Real broker integration for trading missions — paper-only today, behind a feature flag.
A chatbot gives you an answer. A real workflow needs proof.
When the work matters — auditing a website, drafting a refund letter, building a paper-trade thesis, triaging a messy request — a single chat reply isn't enough. You need to know which model ran, which tools it called, which memory it relied on, what went wrong if it failed, and what requires your sign-off before it leaves the desk. None of that is observable when you paste a prompt into a chat box.
Hermes Loop treats every run as a mission: a structured, ordered, validated execution of a crew of agents. Every prompt and every response is persisted. Every tool call is logged and Zod-validated. Every memory injection is recorded. Risky outputs (a draft email, a simulated trade, a report export) wait for you to approve them. When the run is over, a receipt with a deterministic integrity hash captures the whole thing.
Six steps from raw request to verifiable receipt.
- 1Request
You type an objective, drop messy text in the inbox, or fire a schedule. Triage classifies messy input and proposes a crew + objective.
- 2Crew
Pick a built-in crew (Bug Hunter, Paper Trading Desk, Life Admin) or build a custom one. A crew is an ordered list of specialist agents, each with its own system prompt and Zod schema.
- 3Job
In production, the request becomes a RUN_MISSION job. The worker picks it up, runs it serially with retries + exponential backoff, and records every transition.
- 4Tools + Memory
Each agent calls Hermes with its prompt + relevant memory. It can request sandboxed tools — browser_qa_audit (Playwright), terminal_exec, python_rpc, web_search (Tavily), vision_analyze (Gemini), image_generate, text_to_speech (ElevenLabs), web_snapshot, document_extract, price_series_lookup, deadline_create, report_export_draft. Tool calls are logged and validated.
- 5Approval
Drafts, trade tickets, exports, and gated tools become ApprovalItems. Nothing leaves the desk on its own. You approve, reject, or request changes.
- 6Receipt
When the run finishes, a WorkflowReceipt is generated: timeline, cost (real per-model rates), risk score, integrity hash. It rolls up into the trust ledger.
The orchestrator loop, end to end.
The orchestrator runs each agent of the crew in sequence. For every agent, it:
- Builds a system prompt + user prompt from the agent's spec, the mission objective, prior agents' outputs, the selected memories, and the tool catalog this agent is allowed to use.
- Calls
hermesChatagainst${HERMES_API_BASE}/chat/completionswithresponse_format: json_object. Retries 429 / 5xx / network failures with exponential backoff, honouringRetry-After. - Parses the JSON. If the model wrapped it in
{final: …}or a single named envelope, the orchestrator unwraps it. If JSON is malformed it re-prompts with the parser error. - Validates the parsed payload against the agent's Zod schema. On failure, it re-prompts the model with a short list of the specific schema errors and asks for a corrected response. Capped at two corrections per step (audit events:
agent.json_parse_retry,agent.schema_correction,agent.output_unwrapped,agent.output_repaired). - If the model requested tools and the agent allows them, the tools run server-side. Their output feeds the next round. Tools flagged
requiresApprovalhalt the mission and surface as an ApprovalItem. - Persists an
AgentRunSteprow with the prompt, the raw response, the parsed JSON, latency, and token counts. Emits anAuditEvent. - Merges the agent's output into the running context for the next agent.
When all agents finish, the crew-specific finalize step composes a Deliverable (BUG_REPORT, DRAFT, TRADE_JOURNAL, etc.) and the appropriate ApprovalItem(s). The mission moves to WAITING_APPROVAL. The receipt is generated, hashed, and rolled into the trust ledger.
A normal chatbot vs Hermes Loop.
Every object you see in the UI is a real database row.
Hermes Loop is Prisma + SQLite locally, Postgres in production. The schema is the source of truth — you can run prisma studio any time and inspect every row.
UserProfile ─┐
├─< Mission ─┬─< MissionAgent
│ ├─< AgentRunStep (prompt + rawResponse + parsedOutput + tokens + latency)
│ ├─< ToolCall (name, input, output, status, agentRunStepId)
│ ├─< ApprovalItem (PENDING | APPROVED | REJECTED | NEEDS_CHANGES)
│ ├─< Deliverable (REPORT | DRAFT | TRADE_JOURNAL | BUG_REPORT)
│ ├─< AuditEvent (every prompt, response, tool, approval, error)
│ ├─< MemoryUsage (which MemoryItems were injected, why)
│ └─< WorkflowReceipt (status, integrityHash, riskLevel, content blob)
├─< AgentJob (TRIAGE_INBOX | RUN_MISSION | RUN_SCHEDULE | RESUME_AFTER_APPROVAL)
├─< ScheduledMission (cron-like cadences with timezone-aware nextRunAt)
├─< MemoryItem (operator-approved persistent context)
└─< InboxItem (raw request → triaged → converted into Mission)What the desk will and will not do without you.
- No real trading. Paper Trading Desk creates
PaperTraderows withsimulatedOnly = true. There is no broker integration. - No automatic emails. Life Admin creates
EMAIL_DRAFTapproval items. Nothing is sent — drafts only. - No silent fallbacks. When Hermes env vars are configured, network/API failures surface as real errors. The demo responder is only used when the env vars are entirely absent.
- SSRF guard on browser_qa_audit. Localhost / private / metadata IPs are blocked by default. The dev-only escape hatch requires both
NODE_ENV != productionandALLOW_LOCAL_BROWSER_QA = true. - Auditability is first-class. Every prompt, every response, every tool call, every approval is a database row. The receipt's integrity hash is deterministic and content-addressed.
If your work needs proof, this is for you.
Audit a site, get a screenshot-backed bug report, hand off the markdown export to the dev team.
Run Bug Hunter →Paste an order, a refund refusal, a policy doc. Get a draft message, a follow-up plan, and a record of what was claimed.
Open inbox →Build a custom crew that scouts, drafts, critiques, and produces a structured report you can prove ran.
Build a crew →Compare runs across models, see token + cost breakdowns at real per-model rates, gate every action with approvals.
Open the desk →Every page in the app, one line each.
- DashboardMission status, attention queue, deliverables, audit feed.
- CrewsBuilt-in and custom agent templates.
- ToolsSandbox surface for terminal, Python, web search, vision, images, TTS, integrations.
- TrustAggregate reliability + risk roll-up across all runs.
- InboxDrop messy requests; triage proposes a mission.
- JobsPersistent queue for triage, missions, schedules, resumes.
- MemoryOperator-approved context with usage records, diffs, hygiene.
- ApprovalsThe gate: drafts, trades, exports, gated tools.
- SchedulesCron-like cadences materializing fresh missions.
- ReceiptsPer-mission proof with integrity hash and exports.
- SettingsProvider config, demo workspace, diagnostics & runtime.
Hermes Agent is the autonomy. Hermes Loop is the control room.
Hermes Agent — built by Nous Research — is the autonomous engine. It lives on your server, holds persistent memory, schedules its own work via natural language cron, and spawns isolated subagents with their own conversations, terminals, and Python RPC. It can run on local, Docker, SSH, Singularity, or Modal backends. Native capabilities include web search, browser automation, vision, image generation, text-to-speech, and multi-model reasoning.
Hermes Loop is the operator-facing layer that turns those capabilities into a workflow you can ship: named crews, a job queue, approval gates, workflow receipts, a trust ledger, an evals harness, schema self-correction, retries with backoff, real per-run cost accounting, and an audit trail of every prompt, response, tool call, and approval.
Hermes Agent is open source under MIT. Learn more at hermes-agent.nousresearch.com.
The seven words that make up the whole product.
- Mission
- One run of a crew end-to-end. Has a status (DRAFT → RUNNING → WAITING_APPROVAL → COMPLETED / FAILED) and a complete persistent trace.
- Crew
- An ordered sequence of specialist agents — each with a system prompt, a Zod output schema, and an allowed tool list.
- Job
- A background work item picked up by the worker process. Types: TRIAGE_INBOX, RUN_MISSION, RUN_SCHEDULE, RESUME_AFTER_APPROVAL.
- Tool
- A sandboxed action agents can request. Live: browser_qa_audit (Playwright), terminal_exec, python_rpc, web_search (Tavily), vision_analyze (Gemini), image_generate, text_to_speech (ElevenLabs), web_snapshot, document_extract, price_series_lookup, deadline_create, report_export_draft.
- Memory
- Operator-approved persistent context. Selected per mission, recorded, diffable. No implicit retention.
- Approval
- A user decision gate on a risky output (EMAIL_DRAFT, TRADE_SIMULATION, REPORT_EXPORT, FOLLOW_UP, FORM_ACTION, TOOL_CALL).
- Receipt
- A WorkflowReceipt: status, timeline, integrity hash, real cost, risk level, full content blob. The proof artifact.
A 3-minute path that exercises everything.
- 1. Load the demo workspace. Settings → Load demo. Three pre-completed missions, an active schedule, memories, receipts — instant context. Open settings →
- 2. Run Bug Hunter on /demo-target. Real Playwright crawl, real findings, real Hermes calls. Launch the demo run →
- 3. Approve the QA report. Watch the mission move from
WAITING_APPROVALto settled, deliverable accepted. - 4. Open the receipt for that mission. See the timeline, the integrity hash, the real per-model cost. Open receipts →
- 5. Open /trust. Watch your run roll into the aggregate reliability picture. Open trust →
The /system page is the canonical product map (counts, flow, workflows, vocabulary). Onboarding has a 5-step checklist that forces you through the full loop once.


