What Is Swival
Swival is a CLI coding agent built by Frank Denis (jedisct1, creator of libsodium). Pure Python, no framework. 575 commits, actively maintained.
The tagline: “A coding agent for any model.”
Most coding agents — Claude Code, Codex, Hermes, Aider — are built and tested against frontier models with 128K+ context windows. They assume clean tool calls, reliable instruction following, and plenty of room. When those assumptions break, the agent falls apart in ways that look like the model’s fault but are really the agent’s.
Swival takes the opposite approach. It assumes the model will have tight limits and rough edges, then does the extra work to keep the task moving anyway.
Source: github.com/swival/swival
Provider Support
Swival connects to eight provider types out of the box:
| Provider | Auth | Zero Config |
|---|---|---|
| LM Studio | none | auto-discovers loaded model |
| HuggingFace Inference API | HF_TOKEN | requires --model |
| OpenRouter | OPENROUTER_API_KEY | requires --model |
| Google Gemini | GEMINI_API_KEY | requires --model |
| ChatGPT Plus/Pro | browser OAuth on first run | requires --model |
| AWS Bedrock | AWS credential chain | requires --model |
| Generic (OpenAI-compat) | optional OPENAI_API_KEY | requires --model + --base-url |
| Command (external program) | none | requires --model as command string |
The generic provider covers ollama, llama.cpp, mlx_lm.server, vLLM, DeepSeek API, and anything that speaks the OpenAI chat completions protocol. The command provider can even wrap codex exec or custom scripts as a backend.
Popular Models
Swival’s userbase skews toward smaller and local models:
Local (primary targets):
- Qwen3-Coder-Next — top recommendation, great quality/speed on consumer hardware
- Qwen3.5-35B-A3B — via vLLM or HuggingFace Endpoints
- Gemma 4 26B A4B — via llama.cpp (GGUF)
- Qwen3-Coder-480B-A35B-4bit — via mlx_lm.server
- Qwen3 32B — via ollama
Cloud:
- GLM-5 via HuggingFace or OpenRouter (note: z.ai’s non-standard URL path
/api/coding/paas/v4is incompatible — Swival auto-appends/v1to the base URL) - Gemini 2.5 Flash via Google
- GPT-5.4 via ChatGPT Plus/Pro (no API key needed — browser OAuth)
- Claude Opus 4.6 via AWS Bedrock
- DeepSeek Chat via DeepSeek API
- Qwen3-Coder-480B via NVIDIA NIM (generic provider — rate limit issues, see below)
The heavy Qwen3 bias makes sense — Qwen3 has strong tool-calling at small quantized sizes, which is exactly where Swival shines.
Installation
uv (Python 3.13+):
uv tool install swivaluv tool upgrade swivalHomebrew (macOS):
brew install swival/tap/swivalbrew upgrade swivalNo npm, no Docker, no binary download. Pure Python, minimal dependencies.
Quick Start
With LM Studio (zero config)
- Install LM Studio and load a model with tool-calling support
- Start the LM Studio server
- Run:
swival "Refactor the error handling in src/api.py"Swival auto-discovers the loaded model and connects. No flags needed.
With HuggingFace
export HF_TOKEN=hf_...swival --provider huggingface --model zai-org/GLM-5 "Fix the bug in auth.ts"With local llama.cpp
# Start llama-server firstllama-server --reasoning auto --fit on \ -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
# Then connectswival --provider llamacpp "Add input validation to signup endpoint"With NVIDIA NIM
NVIDIA NIM exposes an OpenAI-compatible API. Point Swival’s generic provider at it:
export OPENAI_API_KEY="nvapi-..."swival --provider generic \ --base-url https://integrate.api.nvidia.com/v1 \ --model qwen/qwen3-coder-480b-a35b-instruct \ "Add input validation to the signup endpoint"Or pass the key inline with --api-key:
swival --provider generic \ --base-url https://integrate.api.nvidia.com/v1 \ --api-key "nvapi-..." \ --model qwen/qwen3-coder-480b-a35b-instruct \ "task"Tested: NIM works via the generic provider. Note that the generic provider reads
OPENAI_API_KEYfor authentication, notNVIDIA_API_KEY.
Known Incompatibility: z.ai
z.ai uses a non-standard API path (/api/coding/paas/v4) that doesn’t follow the OpenAI convention. Swival’s generic provider auto-appends /v1 to the base URL, which produces an incorrect endpoint. As of this writing, z.ai cannot be used with Swival.
NIM: Rate Limit Issues
NVIDIA NIM technically works via the generic provider, but Swival’s agent loop makes multiple API calls per task (tool calls, compaction, retries). This quickly exhausts NIM’s free tier rate quota. Unless you have a paid NIM deployment or generous quota, it’s not practical for real use.
Generic Provider Caveat: openai/ Prefix
Swival’s generic provider automatically prepends openai/ to the model name. So if you pass --model glm-5-turbo, Swival sends openai/glm-5-turbo to the API. This is intentional for routing within Swival’s internal logic, but it breaks cloud providers that don’t recognize the openai/ prefix — making agentic tool calling unusable for many cloud LLMs. If your provider rejects prefixed model names, there’s no workaround as of this writing.
With generic OpenAI-compatible server
swival --provider generic \ --base-url http://127.0.0.1:8080 \ --model my-model \ "task description"Interactive REPL
swivalThe REPL carries conversation history across turns, good for exploratory work.
Stdin piping
swival -q < objective.mdcat prompts/review.md | swival --provider huggingface --model zai-org/GLM-5CLI-native design: stdout is exclusively the final answer, all diagnostics go to stderr. Pipe output straight into files or other commands.
Deep Dive: Context Management
This is where Swival diverges most from other agents. The entire system is designed around one reality: small models have tight context windows, and context management is the agent’s job, not the model’s.
The Four-Level Compaction Pipeline
When the context window fills up, Swival runs a graduated compaction pipeline:
- Shrink old tool results — truncate large file reads and command outputs from earlier turns
- Drop low-value turns — score each turn by importance, drop the least valuable ones
- Nuclear drop — keep only the last two turns
- Shed tool schemas — remove tool definitions entirely, relying on the model’s prior knowledge
Each level only fires if the previous one wasn’t enough. This prevents over-aggressive compaction on tasks that don’t need it.
Knowledge That Survives Compaction
This is the single most important design decision. Three things live outside the message history:
- Thinking notes — a
thinktool gives the model a structured scratchpad - Todo checklist — a
todotool tracks work items - Snapshot summaries — a
snapshottool lets the agent compress its investigation into a summary
Even after the most aggressive compaction (level 4), the agent still has its reasoning chain, task list, and accumulated knowledge. It can lose every message and keep working.
Bounded Tool Output
Hard limits prevent single tool calls from blowing the context budget:
- File reads capped at 50KB
- Grep returns at most 100 matches
- Command output over 10KB saved to temp file, replaced with a pointer
- MCP tool schemas that would eat more than half the context window are dropped at startup
Forgiving Parsers
- Tool-call parsing uses multi-pass recovery — if JSON is slightly broken, Swival tries to fix it
- The edit tool uses three-pass matching: exact, line-trimmed, unicode-normalized
- These add up to fewer stalled loops with smaller models
Error Guardrails
- Same error twice: warn the model
- Same error three times: tell it to stop and try something different
- Prevents small models from burning their entire context budget on a loop
Other Features
Review loop and LLM-as-judge. Configurable review loop that runs external reviewer scripts or uses a built-in LLM-as-judge to evaluate and retry output. Good for quality assurance on critical tasks.
Benchmarking reports. --report report.json writes machine-readable evaluation data: per-call LLM timing, tool success/failure counts, context compaction events, and guardrail interventions.
Secret encryption. --encrypt-secrets transparently detects API keys in LLM messages and encrypts them before they leave your machine. The LLM never sees real values. Decryption happens locally when responses come back.
Cross-session memory. Stores notes in a local memory file, retrieves relevant entries using BM25 ranking. Use /learn in the REPL to teach it something on the spot.
Session resume. When interrupted (Ctrl+C, max turns, context overflow), state saves to disk. Next run in the same directory picks up where it left off.
A2A server mode. swival --serve makes the agent an Agent-to-Agent endpoint other agents can call over HTTP. Multi-turn context, streaming, rate limiting, bearer auth built in.
Skills, MCP, and A2A. SKILL.md-based skills for reusable workflows, Model Context Protocol for external tools, Agent-to-Agent protocol for remote agent communication.
Prompt caching. Automatically marks the system message as cacheable for providers that support it (Anthropic, Gemini, Bedrock). Typically saves 30-60% of input token costs.
Swival vs Hermes Agent
| Feature | Swival | Hermes Agent |
|---|---|---|
| Language | Pure Python | Python + Node.js (browser tool) |
| Install | uv tool install swival | Git clone + setup |
| Primary target | Small/local models (8B-35B, 16K-32K ctx) | Frontier models (128K+ ctx) |
| Default provider | LM Studio (auto-discover) | Configurable (Anthropic, OpenAI, etc.) |
| Context compaction | 4-level graduated pipeline | Auto compression |
| Durable state | think/todo/snapshot survive compaction | Memory tool (DB-backed) |
| Skills | SKILL.md-based | SKILL.md-based |
| MCP | Yes | Yes |
| A2A | Built-in server mode (--serve) | Via gateway/platforms |
| Memory | BM25-ranked local file | SQLite FTS5 + fact_store |
| Secret encryption | Built-in (--encrypt-secrets) | No |
| Benchmarking | --report report.json with telemetry | No built-in |
| Review loop | External scripts + LLM-as-judge | No |
| ChatGPT Plus/Pro | Browser OAuth (no API key) | No |
| AWS Bedrock | Native provider | Via OpenAI-compatible |
| Command provider | Wrap any CLI as backend | ACP adapter (Claude Code, etc.) |
| Platforms | CLI only | CLI + Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| Subagents | No | Yes (delegate_task) |
| Background processes | No | Yes |
| Cron jobs | No | Yes |
| Browser automation | Via MCP (Chrome DevTools, Lightpanda, agent-browser) | Built-in browser tool |
| Code execution | Command tool | Sandbox execute_code |
| CLI design | stdout = answer only, stderr = diagnostics | Rich terminal UI |
When to use Swival:
- Running local models with tight context windows
- You want zero-config with LM Studio
- Benchmarking model performance on coding tasks
- You need secret encryption for API keys
- CLI piping workflows (
swival -q < task.md | jq .) - Using ChatGPT Plus/Pro without a separate API key
When to use Hermes:
- You need multi-platform delivery (Telegram, Discord, etc.)
- Subagent orchestration and parallel workstreams
- Background processes and cron scheduling
- Built-in browser automation
- Richer tool ecosystem (3000+ tests, vision, sandbox execution)
They’re complementary more than competing. Swival excels at the “run a local model on a coding task and get reliable output” problem. Hermes is a full agent platform with messaging, scheduling, and multi-agent coordination.
Profile Configuration
If you switch between providers regularly, use profiles:
[profiles.local]provider = "lmstudio"model = "qwen3-coder-next"
[profiles.gpt5]provider = "chatgpt"model = "gpt-5.4"reasoning_effort = "high"
[profiles.hf]provider = "huggingface"model = "zai-org/GLM-5"swival --profile local "quick task"swival --profile gpt5 "hard task"Switch mid-session in the REPL:
swival> /profile gpt5Final Thoughts
Swival fills a real gap for local coding. Its entire architecture — 4-level compaction, durable state, bounded outputs, forgiving parsers — is built around the constraint that context is scarce and model outputs are imperfect. This is exactly the reality when running 8B-35B models locally.
Where Swival falls short is with cloud API providers. In our testing:
- z.ai — incompatible. Non-standard URL path (
/api/coding/paas/v4) breaks Swival’s auto-appended/v1. - NVIDIA NIM — technically works, but Swival’s multi-turn agent loop burns through NIM’s rate quota fast. Not practical on free tiers.
- Cloud LLMs via generic provider — Swival auto-prepends
openai/to model names, breaking providers that don’t recognize the prefix. Agentic tool calling becomes unusable.
The common thread: Swival’s generic provider isn’t truly generic. Between the forced openai/ prefix, the auto-appended /v1 on URLs, and the chatty agent loop, cloud API usage is fragile at best. This is fine when inference is local and free (LM Studio, llama.cpp, ollama). It becomes a problem when you’re paying per token or have rate limits.
Verdict: Swival is best suited for local coding with local models. If you have a GPU and run Qwen3, Gemma, or similar models locally, Swival is purpose-built for that workflow — zero config with LM Studio, auto-discovery, and context management that keeps small models on track. For cloud-based coding with API providers, agents like Hermes, Claude Code, or Codex are better suited due to their optimized prompt caching, fewer round-trips, and provider-native integrations.
At ~45 GitHub stars and 575 commits from a respected security engineer, it’s early but clearly serious. If you run local models for coding tasks and find other agents unreliable, Swival is worth a try.
Links:
- Repository: github.com/swival/swival
- Documentation: swival.dev
- Install:
uv tool install swival(Python 3.13+)

