Hermes Kanban Internals: Multi-Agent Orchestration from SQLite to Dashboard

· 5 min read ai

TL;DR: Hermes Kanban is a durable, SQLite-backed task board that orchestrates named agent profiles through dependency graphs. Unlike delegate_task (an RPC call that dies with the parent), Kanban tasks survive crashes, support retry with full attempt history, enable human-in-the-loop via block/unblock, and produce structured handoff metadata that downstream agents read automatically. The dispatcher lives inside the gateway and polls every 60 seconds. Workers interact through a dedicated kanban_* toolset — they never shell out to the CLI.

Why Not Just Use delegate_task?

delegate_task is a function call. Kanban is a work queue. The distinction sounds academic until your worker OOMs at 2.3M rows and you need the second attempt to know what the first one tried.

Aspectdelegate_taskKanban
ShapeRPC (fork → join)Durable message queue + state machine
Parent blocks until child returnsYesNo — fire-and-forget after create
Child identityAnonymous subagentNamed profile with persistent memory
ResumabilityNone — failed = failedBlock → unblock → re-run; crash → reclaim
Human in the loopNot supportedComment / unblock at any point
Attempts per taskOne call = one subagentN agents over task’s life (retry, review, follow-up)
Audit trailLost on context compressionDurable rows in SQLite forever
CoordinationHierarchical (caller → callee)Peer — any profile reads/writes any task

Use delegate_task when the parent needs a short reasoning answer before continuing, no humans involved, result goes back into the parent’s context. Use Kanban when work crosses agent boundaries, needs to survive restarts, might need human input, or needs an audit trail.

Architecture: Three Surfaces, One Database

Everything routes through a single SQLite database per board (~/.hermes/kanban.db for the default board). Three front doors:

┌────────────────────────┐ WebSocket (tails task_events)
│ Dashboard (React SPA) │ ◀──────────────────────────────────┐
│ drag-drop + drawers │ │
└──────────┬─────────────┘ │
│ REST over fetch │
▼ │
┌────────────────────────┐ writes call kanban_db.* │
│ FastAPI router │ directly — same code path │
│ plugins/kanban/ │ the CLI /kanban verbs use │
└──────────┬─────────────┘ │
│ │
▼ │
┌────────────────────────┐ append task_events ──────────────┘
│ ~/.hermes/kanban.db │
│ (WAL, shared) │
└────────────────────────┘

Agents drive the board through kanban_* tools — seven tools that read and mutate the board directly via the Python kanban_db layer. Workers never shell out to hermes kanban.

You drive the board through the CLIhermes kanban create, hermes kanban list, etc. Both surfaces route through the same kanban_db layer, so reads see a consistent view and writes can’t drift.

The dashboard is a thin read-through/write-through layer with no domain logic of its own — ~700 lines of Python. It reads theme CSS vars and reskins automatically.

The Data Model

Tasks

Each task is a row with:

  • title, body (markdown)
  • assignee — a profile name (e.g., researcher, backend-dev)
  • statustriage → todo → ready → running → blocked → done → archived
  • tenant — optional namespace for multi-client fleets
  • idempotency key — dedup for automated task creation
  • priority, workspace kind, max_runtime

task_links rows record parent → child edges. The dispatcher promotes todo → ready when all parents reach done. This is the dependency engine — no manual coordination.

Runs (Attempt History)

A task is a logical unit; a run is one attempt. When the dispatcher claims a ready task, it creates a task_runs row. When the attempt ends (completed, blocked, crashed, timed out, spawn_failed, reclaimed), the run closes with an outcome.

Why two tables: you need full attempt history for postmortems and a clean place to hang per-attempt metadata. A task attempted three times has three task_runs rows.

Events

Every transition appends a row to task_events. Three clusters:

  • Lifecycle: created, promoted, claimed, completed, blocked, unblocked, archived
  • Edits: assigned, edited, reprioritized, status (drag-drop)
  • Worker telemetry: spawned, heartbeat, reclaimed, crashed, timed_out, spawn_failed, gave_up

The Dispatcher

The dispatcher is a long-lived loop embedded in the gateway process. Every 60 seconds (configurable), it:

  1. Reclaims stale claims — TTL expired, task goes back to ready
  2. Reclaims crashed workers — PID gone but TTL not yet expired
  3. Promotes ready taskstodo → ready when all parents are done
  4. Atomically claims and spawns — assigns profile to the task
config.yaml
kanban:
dispatch_in_gateway: true # default
dispatch_interval_seconds: 60 # default

After ~5 consecutive spawn failures on the same task, the circuit breaker fires: the task auto-blocks with the last error as the reason. This prevents thrashing on tasks whose profile doesn’t exist or workspace can’t mount.

Worker Lifecycle: 6 Steps

Workers don’t use the CLI. They use seven dedicated tools injected by HERMES_KANBAN_TASK env var.

Step 1 — Orient

# Worker tool call (NOT a shell command)
kanban_show()

Returns title, body, worker_context (parent handoffs, prior attempts, comment thread), workspace path, and tenant. The worker reads this to understand what to do and what’s already been tried.

Step 2 — Work

# cd to workspace, do the actual work
# terminal tool calls happen here

Step 3 — Heartbeat (for long operations)

kanban_heartbeat(note="scanned 1.2M/2.4M rows")

Every few minutes max. Skip for tasks under ~2 minutes.

Step 4 — Complete or Block

kanban_complete(
summary="migrated limiter.py to token-bucket; added 14 tests, all pass",
metadata={
"changed_files": ["limiter.py", "tests/test_limiter.py"],
"tests_run": 14,
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
},
)

Or if stuck:

kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id?")

Step 5 — Handoff

The summary and metadata are the primary handoff channel. When a downstream worker calls kanban_show(), it sees:

  • Prior attempts on its own task (outcome, summary, error, metadata) — so retrying workers don’t repeat failed paths
  • Parent task results — the most-recent completed run’s summary and metadata — so downstream workers know what upstream decided

This replaces the “dig through comments and output” dance. A PM writes acceptance criteria in metadata; the engineer’s worker sees them structurally. An engineer records test results; the reviewer has that list before opening a diff.

Step 6 — Cleanup

The dispatcher detects the worker is done (via the completed/blocked status change) and moves to the next ready task.

The Orchestrator Pattern

An orchestrator does not do the work. It decomposes, routes, and summarizes.

# Worker tool calls from an orchestrator profile
kanban_show()
t1 = kanban_create(
title="research ICP funding, NA angle",
assignee="researcher-a",
body="focus on seed + series A, North America, AI-adjacent",
)
# → {"task_id": "t_r1"}
t2 = kanban_create(
title="research ICP funding, EU angle",
assignee="researcher-b",
body="focus on EU digital sovereignty funds, AI Act compliance",
)
# → {"task_id": "t_r2"}
t3 = kanban_create(
title="synthesize ICP funding research into launch post draft",
assignee="writer",
parents=["t_r1", "t_r2"], # promoted to 'ready' when both complete
body="one-pager, neutral tone, cite sources inline",
)
# → {"task_id": "t_w1"}
kanban_complete(
summary="decomposed into 2 parallel research tasks → 1 synthesis task",
)

The kanban-orchestrator skill enforces anti-temptation rules: the orchestrator profile should have restricted toolsets (no terminal, file, web) so it literally cannot execute implementation tasks even if it tries.

The 9 Collaboration Patterns

The board supports these without any new primitives:

#PatternShapeExample
P1Fan-outN siblings, same role”research 5 angles in parallel”
P2PipelineRole chain: scout → editor → writerDaily brief assembly
P3Voting / quorumN siblings + 1 aggregator3 researchers → 1 reviewer picks
P4Long-running journalSame profile + shared dir + cronObsidian vault maintenance
P5Human-in-the-loopWorker blocks → user comments → unblockAmbiguous decisions
P6@mentionInline routing from prose@reviewer look at this
P7Thread-scoped workspace/kanban here in a threadPer-project gateway threads
P8Fleet farmingOne profile, N subjects50 social accounts, 12 monitored services
P9Triage specifierRough idea → triage → specifier expands → todo”turn this one-liner into a spec”

Fan-out + Fan-in (Most Common)

N researchers in parallel, one analyst synthesizing:

Terminal window
# Create parallel research tasks
R1=$(hermes kanban create "Postgres cost analysis" --assignee researcher --json | jq -r .id)
R2=$(hermes kanban create "Postgres perf benchmarks" --assignee researcher --json | jq -r .id)
R3=$(hermes kanban create "Postgres operational complexity" --assignee researcher --json | jq -r .id)
# Synthesis depends on all three
hermes kanban create "migration recommendation report" \
--assignee analyst \
--parent $R1 --parent $R2 --parent $R3 \
--body "1-page recommendation with explicit trade-offs and go/no-go call"

Only R1, R2, R3 start in ready. The synthesis task auto-promotes when all three hit done.

Pipeline with Gates

PM writes spec → engineer implements → reviewer approves or blocks → engineer iterates:

Terminal window
SPEC=$(hermes kanban create "spec: password reset flow" --assignee pm --json | jq -r .id)
IMPL=$(hermes kanban create "implement password reset" --assignee backend-dev --parent $SPEC --json | jq -r .id)
REVIEW=$(hermes kanban create "review password reset PR" --assignee reviewer --parent $IMPL --json | jq -r .id)

If the reviewer blocks, you don’t re-run the same task. You create a new task linked from the reviewer’s task, assigned back to the engineer. Each iteration is a fresh task with its own run history.

Human-in-the-Loop

Workers block when they need a decision. You respond via comment, then unblock:

Terminal window
# Worker blocked itself
# hermes kanban show t_xyz
# → status: blocked, reason: "Which schema: v1 (simple) or v2 (normalized)?"
hermes kanban comment t_xyz "Use v2 — normalized. We need the flexibility for the analytics pipeline."
hermes kanban unblock t_xyz

The next spawn of that task reads the comment thread in kanban_show(), so the worker sees your decision without you having to find its terminal session.

Workspaces

Three kinds, set per-task:

KindWhat it isUse when
scratch (default)Fresh tmp dir, GC’d on archiveOne-off tasks, no shared state
dir:<path>Shared persistent directoryObsidian vaults, data dirs, long-lived state
worktreeGit worktree for coding tasksParallel code changes on the same repo

The workspace path is set in HERMES_KANBAN_WORKSPACE env var. Workers cd there at the start of their run. For worktree, the worker runs git worktree add if .git doesn’t exist yet.

Multi-Board (Multi-Project)

One Hermes install can have many boards — one per project, repo, or domain. Each board has:

  • Separate SQLite DB (~/.hermes/kanban/boards/<slug>/kanban.db)
  • Separate workspaces/ and logs/ directories
  • Workers pinned to their board via HERMES_KANBAN_BOARD — they physically cannot see other boards
Terminal window
hermes kanban boards create atm10-server --name "ATM10 Server" --icon 🎮
hermes kanban --board atm10-server create "Restart server" --assignee ops
hermes kanban boards switch atm10-server

Gateway Notifications

Create a task from Telegram/Discord/Slack and you’re automatically subscribed. You get one message per terminal event (completed, blocked, crashed, timed_out) — including the first line of the worker’s summary on completion.

Terminal window
# Explicit subscription from CLI
hermes kanban notify-subscribe t_abcd \
--platform telegram --chat-id 12345678 --thread-id 7
hermes kanban notify-list
hermes kanban notify-unsubscribe t_abcd \
--platform telegram --chat-id 12345678

Subscriptions auto-remove when the task reaches done or archived.

Production Best Practices

1. Write Structured Handoff Metadata

Every kanban_complete should include metadata that answers four questions for the next reader:

  1. What changed?
  2. How was it verified?
  3. What can unblock or retry this if it fails?
  4. What risk is still deliberately left open?
kanban_complete(
summary="shipped rate limiter — token bucket, 14 tests pass",
metadata={
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
"verification": ["pytest tests/ -q"],
"dependencies": ["parent task t_schema"],
"blocked_reason": None,
"residual_risk": ["no load testing yet — needs staging deploy"],
},
)

2. Restrict Orchestrator Toolsets

Pair the orchestrator with a profile that only has kanban, gateway, and memory tools. If the orchestrator can’t call terminal or file, it can’t “just fix this quickly” and break the routing contract.

3. Use Triage for Vague Ideas

Don’t create fully-specified tasks for half-baked ideas. Park them in triage:

Terminal window
hermes kanban create "something about the landing page" --assignee pm --triage

A specifier profile can then flesh out the body and promote to todo.

4. Set Max Runtime

Prevent zombie workers from burning API credits:

Terminal window
hermes kanban create "bulk translate 500 files" \
--assignee translator \
--max-runtime 2h

When the limit is exceeded, the dispatcher SIGTERMs the worker, then SIGKILLs after 5 seconds grace.

5. Use Idempotent Keys for Automation

Prevent duplicate tasks from cron jobs or webhooks:

Terminal window
hermes kanban create "nightly ops review" \
--assignee ops \
--idempotency-key "nightly-ops-$(date -u +%Y-%m-%d)" \
--json

First call creates the task. Subsequent calls with the same key return the existing task ID.

6. Profile Sessions Are Invisible to the Main Agent

Terminal window
hermes sessions list # ← your main agent only
hermes sessions list --profile researcher # ← profile sessions
hermes chat --profile researcher --resume <session-id>

7. Heartbeats Should Name Progress

Good: "epoch 12/50, loss 0.31", "uploaded 47/120 videos" Bad: "still working", empty notes

8. Block Reasons Should Be One Sentence

The block message appears in dashboard notifications and gateway pings. Keep it scannable. Put the long context in a comment:

kanban_comment(
task_id=os.environ["HERMES_KANBAN_TASK"],
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers.",
)
kanban_block(reason="Rate limit key: IP (NAT-unsafe) or user_id (requires auth)?")

9. Don’t Bulk-Close with Shared Summaries

Terminal window
# This is REFUSED — structured handoff is per-run
hermes kanban complete a b c --summary "all done"
# This works — for admin/batch cleanup
hermes kanban complete a b c

10. Watch for Circuit Breaker Trips

After 5 consecutive spawn failures (configurable via --failure-limit), the task auto-blocks with gave_up. Check the error, fix the profile config or workspace, then unblock. The dashboard and hermes kanban runs <id> show the full failure history.

CLI Cheat Sheet

Terminal window
# Board lifecycle
hermes kanban init # create kanban.db
hermes kanban boards create <slug> # multi-project board
hermes kanban boards switch <slug> # change active board
# Task management
hermes kanban create "title" --assignee <profile> [--parent <id>] [--triage]
hermes kanban list [--mine] [--assignee P] [--status S] [--tenant T]
hermes kanban show <id>
hermes kanban complete <id> --summary "..." --metadata '{...}'
hermes kanban block <id> "reason"
hermes kanban unblock <id>
hermes kanban archive <id>
# Dependency management
hermes kanban link <parent_id> <child_id>
hermes kanban unlink <parent_id> <child_id>
# Monitoring
hermes kanban tail <id> # single task events
hermes kanban watch [--kinds completed,blocked] # board-wide stream
hermes kanban runs <id> # attempt history
hermes kanban stats # per-status + per-assignee counts
# Dispatcher
hermes kanban dispatch --dry-run # preview what would be claimed
hermes kanban dispatch --max 3 # one-shot pass
# Notifications
hermes kanban notify-subscribe <id> --platform <name> --chat-id <id>

All commands are also available as /kanban slash commands in the interactive CLI and gateway — and they bypass the running-agent guard, so you can use them mid-turn.

The Dashboard

Open with hermes dashboard and click the Kanban tab. Features:

  • Six columns (triage, todo, ready, running, blocked, done) with live WebSocket updates
  • Drag-drop between columns with confirmation on destructive transitions
  • Per-card drawer with editable title, body (markdown-rendered), dependencies, status actions, comment thread, and run history
  • “Lanes by profile” toggle sub-groups the Running column by assignee
  • Multi-select with bulk actions (archive, reassign, status transitions)
  • Filters for search, tenant, assignee, and archived toggle
  • “Nudge dispatcher” button to skip the 60s poll interval
  • Board switcher for multi-project setups

Case Study: Kanban + Cron Jobs for an AI News Pipeline

BoxminingAI (Superbash) documented a real-world migration from a single-agent cron job to a Kanban-powered multi-agent pipeline for daily AI news aggregation. Here’s what he learned.

The Old Pipeline: Single-Agent Cron

The original setup was one cron job firing at 9:00 AM HKT, spawning a single sub-agent that:

  1. Ran 14 web searches sequentially (no parallelism)
  2. Wrote a markdown report
  3. Updated a landing page
  4. Posted a Discord notification

Problems:

  • No parallel execution — one search failure could stall the entire pipeline
  • No separation of concerns — the same agent handled research, writing, and publishing
  • No verification or retry — failures were final
  • Sub-agent limitations — sub-agents only get AGENTS.md and tool docs, no memories or system prompts, making them less capable than the main agent
  • Shell date bug — the date command syntax in the prompt was passed literally to search queries instead of being executed, producing stale/literal date strings

The result: report quality degraded over time, with fewer sources and shorter articles.

The New Pipeline: Kanban Multi-Stage

The redesigned pipeline uses a parent task with nine children across three stages:

Stage 1 — Research (5 parallel workers)
├── Model Releases
├── Tool Releases
├── Agent Frameworks
├── Trending Workflows
└── Active Inputs (ad-hoc queries)
Stage 2 — Verification (2 editors, blocked by Stage 1)
├── Editor Alpha — filter duplicates, check dates, rank by importance
└── Editor Beta — cross-reference and fill gaps
Stage 3 — Publishing (2 publishers, blocked by Stage 2)
├── Write Report
└── Post Notifications

Results: more structured reports with proper tables, categorization, 48-hour verification, and broader source coverage.

Profile Setup Lessons

Key lessons from setting up specialist profiles:

  1. Feed the documentation first — don’t assume the agent knows about Kanban features. Link the official docs and ask it to understand them before designing the pipeline.
  2. API keys don’t auto-propagate — profile .env files are empty by default. Copy the relevant keys from your main agent’s .env to each profile.
  3. Remove empty API key fields from config.yaml — asterisk placeholders cause errors.
  4. Reuse the same API key across profiles — especially useful for coding plans with token quotas.
  5. Tune reasoning effort per role — research profiles at 90-100 (needs critical thinking), editors at 50-70 (synthesis), publishers at 20-30 (mechanical).

The Cron + Kanban Gap: Four Problems

Combining Kanban with cron jobs revealed fundamental friction:

Problem 1: Gateway exits early. The gateway dispatches ready tasks then exits. If a child is waiting for a parent that completes after the gateway exits, the child never gets dispatched. The solution: run the gateway as a systemd service to keep it alive permanently (works well on VPS, expensive on local).

Problem 2: Duplicate task creation. During test runs, the agent created new parent tasks without checking if one already existed for that date. Orphaned test tasks then confused the production cron run, producing duplicate notifications.

Problem 3: No native synergy. Cron jobs fire on schedule and don’t check the Kanban board before creating tasks. Without custom deduplication logic, every cron run creates a fresh task set regardless of what’s already running or completed.

Problem 4: Task accumulation. Completed parent tasks stay on the board forever unless archived. After a week of daily runs, that’s 7 parents and 63 children cluttering the board, making monitoring harder and increasing the risk of accidental re-dispatch. There’s currently no delete button in the dashboard.

Final Architecture

The working solution combines cron + Kanban with custom safeguards:

Cron fires (`9:00 AM` HKT)
→ Deduplication check (search existing tasks for today's date)
→ Create Kanban parent task + 9 children
→ Gateway runs as systemd service (persistent)
→ Pipeline executes through all stages
→ Discord notifications on completion

The takeaway: Kanban is excellent for multi-agent orchestration, but pairing it with cron requires custom deduplication logic and persistent gateway management. For non-cron projects, Kanban works out of the box with no friction.

Video: Hermes Agent Kanban + Cron Job is POWERFUL (Setup Guide) by BoxminingAI (Superbash)

When Kanban Is Not the Right Tool

  • Single-shot reasoning: Just answer or use delegate_task
  • Multi-host coordination: Kanban is deliberately single-host (local SQLite, PID-based crash detection). For multi-host, run independent boards and bridge with delegate_task or a message queue
  • Sub-second latency requirements: The dispatcher polls every 60 seconds. Use hermes kanban dispatch or the Nudge button for immediate pickup
  • Tasks that need shared mutable state between concurrent workers: Workers are independent processes. Use dir: workspaces for file-level coordination, but there’s no locking primitive

References

  1. Hermes Agent Kanban Documentationhttps://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
  2. Hermes Agent Kanban Tutorialhttps://hermes-agent.nousresearch.com/docs/user-guide/features/kanban-tutorial
  3. Multi-Agent Architecture Issue #344https://github.com/NousResearch/hermes-agent/issues/344
  4. Hermes Agent Kanban Setup Guide (YouTube) — BoxminingAI — https://www.youtube.com/watch?v=R_aLVXYzDac
  5. Hermes Agent Kanban + Cron Job Setup (YouTube) — BoxminingAI (Superbash) — https://www.youtube.com/watch?v=iN2fD36Sgdg
  6. Hermes Agent PM Guidehttps://www.news.aakashg.com/p/hermes-agent-guide
  7. Kanban in Hermes for Self-Hosted LLM Workflowshttps://www.glukhov.org/ai-systems/hermes/kanban-in-hermes/

This article was written by Hermes (glm-5-turbo | zai), based on the official Hermes Agent documentation, design spec, and community resources.