Demo · checklist

Operator demo checklist13 stops from health probe to deployment env vars.

What this is

A practical 13-step path through every working capability. Status is computed live from your actual database + provider env.

What to do

Open each link in a new tab. Items show DONE when the system can verify it; ACTION/READY when there's work to do; BLOCKED when something's broken.

What happens next

When everything reads DONE or READY, you can ship — no fake green.

1
Hermes health
ok=true, mode=hermes, latency under ~500ms
Live at https://openrouter.ai/api/v1
Troubleshoot: If unreachable, check VPN / outbound network. If demo mode, the Hermes env vars are absent.
DoneHit /api/hermes/health →
2
Load demo workspace
Three completed demo missions, one active schedule, sample memories.
10 missions on file
Troubleshoot: Click 'Load demo command center' on /settings or the dashboard. Idempotent — running twice doesn't duplicate.
DoneOpen Settings → Load demo →
3
Run Browser QA against /demo-target
Bug Hunter Crew, real Playwright crawl, mission reaches WAITING_APPROVAL.
Requires `npm run dev` running and ALLOW_LOCAL_BROWSER_QA=true.
Troubleshoot: Prod deployments can't audit localhost — point at any public URL via EVAL_BROWSER_TARGET or the form.
Ready to testLaunch Bug Hunter on /demo-target →
4
Approve a deliverable
Approval card on the mission page or in /approvals; status flips to APPROVED, audit row + Discord notification (if configured).
5 approval(s) waiting
Needs actionOpen approvals inbox →
5
Generate the receipt
WorkflowReceipt with deterministic integrity hash, riskLevel, real cost via openrouter-live.
4 receipt(s) on file
DoneOpen receipts →
6
Trust ledger updates
Aggregate KPIs reflect the new mission. Browser QA + runtime panels pick up the run.
DoneOpen trust ledger →
7
Run the runtime terminal
Codebase Debugger crew or sandbox call. Backend chip shows local or docker — never silent.
7 runtime tool call(s) on file
Ready to testLaunch Codebase Debugger →
8
Python RPC sandbox
Approval-gated python_rpc with backend metadata. Docker if available, otherwise host fallback flagged.
Docker not available — host fallback only.
Ready to testOpen Python RPC history →
9
Provider configuration matrix
Every provider (Hermes, search, vision, image, TTS, Discord, Slack, Email, runtime) shows READY / PARTIAL / NOT_CONFIGURED.
13 ready · 0 partial · 1 not configured
Ready to testOpen provider matrix →
10
Inbound integrations
POST creates an InboxItem and queues TRIAGE_INBOX. Recent inbound table shows the item.
Try /integrations/discord, /integrations/slack, /integrations/email — each page has a copy-pasteable curl.
Ready to testOpen integrations →
11
CLI smoke
`npm run foundry -- health` prints {mode,ok,latency}. `jobs list`/`receipts list`/`mission create` return JSON. Exit codes are honest.
Troubleshoot: If health fails, env isn't loaded — make sure .env.local exists.
Ready to test
12
Deployment env var sanity
Every secret is in .env.local locally and in the Vercel/Railway dashboard for prod. ALLOW_LOCAL_BROWSER_QA stays unset in prod. MISSION_RUN_MODE=queued + a worker service in prod.
Troubleshoot: See README → Deployment for the full Vercel + Railway recipe.
Ready to testRe-check providers →
13
Mission deliverables surfaced
Each completed mission shows a deliverable (BUG_REPORT / DRAFT / TRADE_JOURNAL / REPORT) on its mission page.
8 deliverable(s) on file
DoneOpen dashboard →

CLI smoke commands

npm run foundry -- health
npm run foundry -- jobs list --limit 3
npm run foundry -- receipts list --limit 3
npm run foundry -- mission create \
  --crew codebase-debugger \
  --title "CLI smoke" \
  --objective "Inspect repository status with safe commands and summarize."

Eval suites

npm run evals:tools     # cheap, no missions; gating + backend metadata
npm run evals           # real Hermes mission suite (~$0.005, ~70s)
# with dev server up:
npm run evals -- --crew=bug-hunter