Swival: A Coding Agent Built to Not Break on Small Models

What Is Swival

Swival is a CLI coding agent built by Frank Denis (jedisct1, creator of libsodium). Pure Python, no framework. 575 commits, actively maintained.

The tagline: “A coding agent for any model.”

Most coding agents — Claude Code, Codex, Hermes, Aider — are built and tested against frontier models with 128K+ context windows. They assume clean tool calls, reliable instruction following, and plenty of room. When those assumptions break, the agent falls apart in ways that look like the model’s fault but are really the agent’s.

Swival takes the opposite approach. It assumes the model will have tight limits and rough edges, then does the extra work to keep the task moving anyway.

Source: github.com/swival/swival

Provider Support

Swival connects to eight provider types out of the box:

Provider	Auth	Zero Config
LM Studio	none	auto-discovers loaded model
HuggingFace Inference API	`HF_TOKEN`	requires `--model`
OpenRouter	`OPENROUTER_API_KEY`	requires `--model`
Google Gemini	`GEMINI_API_KEY`	requires `--model`
ChatGPT Plus/Pro	browser OAuth on first run	requires `--model`
AWS Bedrock	AWS credential chain	requires `--model`
Generic (OpenAI-compat)	optional `OPENAI_API_KEY`	requires `--model` + `--base-url`
Command (external program)	none	requires `--model` as command string

The generic provider covers ollama, llama.cpp, mlx_lm.server, vLLM, DeepSeek API, and anything that speaks the OpenAI chat completions protocol. The command provider can even wrap codex exec or custom scripts as a backend.

Popular Models

Swival’s userbase skews toward smaller and local models:

Local (primary targets):

Qwen3-Coder-Next — top recommendation, great quality/speed on consumer hardware
Qwen3.5-35B-A3B — via vLLM or HuggingFace Endpoints
Gemma 4 26B A4B — via llama.cpp (GGUF)
Qwen3-Coder-480B-A35B-4bit — via mlx_lm.server
Qwen3 32B — via ollama

Cloud:

GLM-5 via HuggingFace or OpenRouter (note: z.ai’s non-standard URL path /api/coding/paas/v4 is incompatible — Swival auto-appends /v1 to the base URL)
Gemini 2.5 Flash via Google
GPT-5.4 via ChatGPT Plus/Pro (no API key needed — browser OAuth)
Claude Opus 4.6 via AWS Bedrock
DeepSeek Chat via DeepSeek API
Qwen3-Coder-480B via NVIDIA NIM (generic provider — rate limit issues, see below)

The heavy Qwen3 bias makes sense — Qwen3 has strong tool-calling at small quantized sizes, which is exactly where Swival shines.

Installation

uv (Python 3.13+):

1
uv tool install swival
2
uv tool upgrade swival

Homebrew (macOS):

1
brew install swival/tap/swival
2
brew upgrade swival

No npm, no Docker, no binary download. Pure Python, minimal dependencies.

Quick Start

With LM Studio (zero config)

Install LM Studio and load a model with tool-calling support
Start the LM Studio server
Run:

1
swival "Refactor the error handling in src/api.py"

Swival auto-discovers the loaded model and connects. No flags needed.

With HuggingFace

1
export HF_TOKEN=hf_...
2
swival --provider huggingface --model zai-org/GLM-5 "Fix the bug in auth.ts"

With local llama.cpp

1
# Start llama-server first
2
llama-server --reasoning auto --fit on \
3
    -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
4

5
# Then connect
6
swival --provider llamacpp "Add input validation to signup endpoint"

With NVIDIA NIM

NVIDIA NIM exposes an OpenAI-compatible API. Point Swival’s generic provider at it:

1
export OPENAI_API_KEY="nvapi-..."
2
swival --provider generic \
3
    --base-url https://integrate.api.nvidia.com/v1 \
4
    --model qwen/qwen3-coder-480b-a35b-instruct \
5
    "Add input validation to the signup endpoint"

Or pass the key inline with --api-key:

1
swival --provider generic \
2
    --base-url https://integrate.api.nvidia.com/v1 \
3
    --api-key "nvapi-..." \
4
    --model qwen/qwen3-coder-480b-a35b-instruct \
5
    "task"

Tested: NIM works via the generic provider. Note that the generic provider reads OPENAI_API_KEY for authentication, not NVIDIA_API_KEY.

Known Incompatibility: z.ai

z.ai uses a non-standard API path (/api/coding/paas/v4) that doesn’t follow the OpenAI convention. Swival’s generic provider auto-appends /v1 to the base URL, which produces an incorrect endpoint. As of this writing, z.ai cannot be used with Swival.

NIM: Rate Limit Issues

NVIDIA NIM technically works via the generic provider, but Swival’s agent loop makes multiple API calls per task (tool calls, compaction, retries). This quickly exhausts NIM’s free tier rate quota. Unless you have a paid NIM deployment or generous quota, it’s not practical for real use.

Generic Provider Caveat: `openai/` Prefix

Swival’s generic provider automatically prepends openai/ to the model name. So if you pass --model glm-5-turbo, Swival sends openai/glm-5-turbo to the API. This is intentional for routing within Swival’s internal logic, but it breaks cloud providers that don’t recognize the openai/ prefix — making agentic tool calling unusable for many cloud LLMs. If your provider rejects prefixed model names, there’s no workaround as of this writing.

With generic OpenAI-compatible server

1
swival --provider generic \
2
    --base-url http://127.0.0.1:8080 \
3
    --model my-model \
4
    "task description"

Interactive REPL

1
swival

The REPL carries conversation history across turns, good for exploratory work.

Stdin piping

1
swival -q < objective.md
2
cat prompts/review.md | swival --provider huggingface --model zai-org/GLM-5

CLI-native design: stdout is exclusively the final answer, all diagnostics go to stderr. Pipe output straight into files or other commands.

Deep Dive: Context Management

This is where Swival diverges most from other agents. The entire system is designed around one reality: small models have tight context windows, and context management is the agent’s job, not the model’s.

The Four-Level Compaction Pipeline

When the context window fills up, Swival runs a graduated compaction pipeline:

Shrink old tool results — truncate large file reads and command outputs from earlier turns
Drop low-value turns — score each turn by importance, drop the least valuable ones
Nuclear drop — keep only the last two turns
Shed tool schemas — remove tool definitions entirely, relying on the model’s prior knowledge

Each level only fires if the previous one wasn’t enough. This prevents over-aggressive compaction on tasks that don’t need it.

Knowledge That Survives Compaction

This is the single most important design decision. Three things live outside the message history:

Thinking notes — a think tool gives the model a structured scratchpad
Todo checklist — a todo tool tracks work items
Snapshot summaries — a snapshot tool lets the agent compress its investigation into a summary

Even after the most aggressive compaction (level 4), the agent still has its reasoning chain, task list, and accumulated knowledge. It can lose every message and keep working.

Bounded Tool Output

Hard limits prevent single tool calls from blowing the context budget:

File reads capped at 50KB
Grep returns at most 100 matches
Command output over 10KB saved to temp file, replaced with a pointer
MCP tool schemas that would eat more than half the context window are dropped at startup

Forgiving Parsers

Tool-call parsing uses multi-pass recovery — if JSON is slightly broken, Swival tries to fix it
The edit tool uses three-pass matching: exact, line-trimmed, unicode-normalized
These add up to fewer stalled loops with smaller models

Error Guardrails

Same error twice: warn the model
Same error three times: tell it to stop and try something different
Prevents small models from burning their entire context budget on a loop

Other Features

Review loop and LLM-as-judge. Configurable review loop that runs external reviewer scripts or uses a built-in LLM-as-judge to evaluate and retry output. Good for quality assurance on critical tasks.

Benchmarking reports. --report report.json writes machine-readable evaluation data: per-call LLM timing, tool success/failure counts, context compaction events, and guardrail interventions.

Secret encryption. --encrypt-secrets transparently detects API keys in LLM messages and encrypts them before they leave your machine. The LLM never sees real values. Decryption happens locally when responses come back.

Cross-session memory. Stores notes in a local memory file, retrieves relevant entries using BM25 ranking. Use /learn in the REPL to teach it something on the spot.

Session resume. When interrupted (Ctrl+C, max turns, context overflow), state saves to disk. Next run in the same directory picks up where it left off.

A2A server mode. swival --serve makes the agent an Agent-to-Agent endpoint other agents can call over HTTP. Multi-turn context, streaming, rate limiting, bearer auth built in.

Skills, MCP, and A2A. SKILL.md-based skills for reusable workflows, Model Context Protocol for external tools, Agent-to-Agent protocol for remote agent communication.

Prompt caching. Automatically marks the system message as cacheable for providers that support it (Anthropic, Gemini, Bedrock). Typically saves 30-60% of input token costs.

Swival vs Hermes Agent

Feature	Swival	Hermes Agent
Language	Pure Python	Python + Node.js (browser tool)
Install	`uv tool install swival`	Git clone + setup
Primary target	Small/local models (8B-35B, 16K-32K ctx)	Frontier models (128K+ ctx)
Default provider	LM Studio (auto-discover)	Configurable (Anthropic, OpenAI, etc.)
Context compaction	4-level graduated pipeline	Auto compression
Durable state	think/todo/snapshot survive compaction	Memory tool (DB-backed)
Skills	SKILL.md-based	SKILL.md-based
MCP	Yes	Yes
A2A	Built-in server mode (`--serve`)	Via gateway/platforms
Memory	BM25-ranked local file	SQLite FTS5 + fact_store
Secret encryption	Built-in (`--encrypt-secrets`)	No
Benchmarking	`--report report.json` with telemetry	No built-in
Review loop	External scripts + LLM-as-judge	No
ChatGPT Plus/Pro	Browser OAuth (no API key)	No
AWS Bedrock	Native provider	Via OpenAI-compatible
Command provider	Wrap any CLI as backend	ACP adapter (Claude Code, etc.)
Platforms	CLI only	CLI + Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant
Subagents	No	Yes (delegate_task)
Background processes	No	Yes
Cron jobs	No	Yes
Browser automation	Via MCP (Chrome DevTools, Lightpanda, agent-browser)	Built-in browser tool
Code execution	Command tool	Sandbox execute_code
CLI design	stdout = answer only, stderr = diagnostics	Rich terminal UI

When to use Swival:

Running local models with tight context windows
You want zero-config with LM Studio
Benchmarking model performance on coding tasks
You need secret encryption for API keys
CLI piping workflows (swival -q < task.md | jq .)
Using ChatGPT Plus/Pro without a separate API key

When to use Hermes:

You need multi-platform delivery (Telegram, Discord, etc.)
Subagent orchestration and parallel workstreams
Background processes and cron scheduling
Built-in browser automation
Richer tool ecosystem (3000+ tests, vision, sandbox execution)

They’re complementary more than competing. Swival excels at the “run a local model on a coding task and get reliable output” problem. Hermes is a full agent platform with messaging, scheduling, and multi-agent coordination.

Profile Configuration

If you switch between providers regularly, use profiles:

1
[profiles.local]
2
provider = "lmstudio"
3
model = "qwen3-coder-next"
4

5
[profiles.gpt5]
6
provider = "chatgpt"
7
model = "gpt-5.4"
8
reasoning_effort = "high"
9

10
[profiles.hf]
11
provider = "huggingface"
12
model = "zai-org/GLM-5"

1
swival --profile local "quick task"
2
swival --profile gpt5 "hard task"

Switch mid-session in the REPL:

1
swival> /profile gpt5

Final Thoughts

Swival fills a real gap for local coding. Its entire architecture — 4-level compaction, durable state, bounded outputs, forgiving parsers — is built around the constraint that context is scarce and model outputs are imperfect. This is exactly the reality when running 8B-35B models locally.

Where Swival falls short is with cloud API providers. In our testing:

z.ai — incompatible. Non-standard URL path (/api/coding/paas/v4) breaks Swival’s auto-appended /v1.
NVIDIA NIM — technically works, but Swival’s multi-turn agent loop burns through NIM’s rate quota fast. Not practical on free tiers.
Cloud LLMs via generic provider — Swival auto-prepends openai/ to model names, breaking providers that don’t recognize the prefix. Agentic tool calling becomes unusable.

The common thread: Swival’s generic provider isn’t truly generic. Between the forced openai/ prefix, the auto-appended /v1 on URLs, and the chatty agent loop, cloud API usage is fragile at best. This is fine when inference is local and free (LM Studio, llama.cpp, ollama). It becomes a problem when you’re paying per token or have rate limits.

Verdict: Swival is best suited for local coding with local models. If you have a GPU and run Qwen3, Gemma, or similar models locally, Swival is purpose-built for that workflow — zero config with LM Studio, auto-discovery, and context management that keeps small models on track. For cloud-based coding with API providers, agents like Hermes, Claude Code, or Codex are better suited due to their optimized prompt caching, fewer round-trips, and provider-native integrations.

At ~45 GitHub stars and 575 commits from a respected security engineer, it’s early but clearly serious. If you run local models for coding tasks and find other agents unreliable, Swival is worth a try.

Links:

Repository: github.com/swival/swival
Documentation: swival.dev
Install: uv tool install swival (Python 3.13+)