Context Mode: The MCP Server That Solves Claude Code's Context Bloat

The Context Bloat Problem

If you’ve been coding with Claude Code, you’ve likely encountered context bloat. Every MCP tool call dumps its full output directly into the model’s 200K context window. The more tools in your belt, the faster your context depletes.

Under typical scenarios, you’re looking at roughly 30 minutes of active agent use before context compaction occurs. That’s when the AI starts forgetting files, tasks, and crucial decisions—not to mention the token costs.

The Math Behind Context Exhaustion

Operation	Context Cost
Playwright snapshot	~56 KB
20 GitHub issues	~59 KB
Access log (500 requests)	~45 KB
Git log (153 commits)	~12 KB

If you perform these operations multiple times during a planning phase, you’ve eaten 70% of your context window before the agent has written a single line of code.

Context Mode Architecture

Context Mode operates at the MCP protocol layer—not as a CLI output filter or cloud analytics dashboard. Raw data stays in a sandboxed subprocess and never enters your context window.

Key architectural principles:

Privacy-first: No telemetry, no cloud sync, no usage tracking, no account required
Local storage: SQLite databases live in your home directory
Process isolation: Each ctx_execute call spawns an isolated subprocess with its own process boundary

The Sandbox Tools

Context Mode provides 6 sandbox tools that intercept large outputs before they flood your context:

Tool	Function	Context Reduction
`ctx_execute`	Run code in 11 languages, only stdout enters context	56 KB → 299 B
`ctx_execute_file`	Process files in sandbox, raw content never leaves	45 KB → 155 B
`ctx_batch_execute`	Run multiple commands + search queries in ONE call	986 KB → 62 KB
`ctx_index`	Chunk markdown into FTS5 with BM25 ranking	60 KB → 40 B
`ctx_search`	Query indexed content with multiple queries	On-demand retrieval
`ctx_fetch_and_index`	Fetch URL, detect content type, chunk and index	60 KB → 40 B

How the Sandbox Works

Scripts run in isolated subprocesses that can’t access each other’s memory or state. The subprocess captures stdout, and only that enters the conversation context. The raw data—log files, API responses, snapshots—never leaves the sandbox.

Supported runtimes: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R, and Elixir. Bun is auto-detected for 3-5x faster JS/TS execution.

Intent-driven filtering: When output exceeds 5 KB and an intent is provided, Context Mode indexes the full output into the knowledge base, searches for sections matching your intent, and returns only relevant matches.

The Knowledge Base: FTS5 + BM25

The ctx_index tool chunks markdown content by headings while preserving code blocks, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table.

Search Algorithm

Search uses BM25 ranking—a probabilistic relevance algorithm that scores documents based on:

Term frequency
Inverse document frequency
Document length normalization

Porter stemming is applied at index time so “running”, “runs”, and “ran” match the same stem. Titles and headings are weighted 5x in BM25 scoring.

Reciprocal Rank Fusion (RRF)

Search runs two parallel strategies and merges them:

Porter stemming — FTS5 MATCH with porter tokenizer
Trigram substring — FTS5 trigram tokenizer for partial strings

RRF merges both ranked lists, so documents that rank well in both strategies surface higher.

Smart Snippets

Instead of returning the first N characters, Context Mode finds where query terms appear and returns windows around those matches.

Session Continuity

When the context window fills up, the agent compacts the conversation—dropping older messages. Without session tracking, the model forgets which files it was editing, what tasks are in progress, and what errors were resolved.

Event Capture

Context Mode captures every meaningful event during your session:

Category	Events	Priority
Files	read, edit, write, glob, grep	Critical (P1)
Tasks	create, update, complete	Critical (P1)
Decisions	User corrections (“use X instead”)	High (P2)
Git	checkout, commit, merge, rebase	High (P2)
Errors	Tool failures, non-zero exit codes	High (P2)
Environment	cwd changes, venv, package installs	High (P2)

How Sessions Survive Compaction

1
PreCompact fires
2
  → Read all session events from SQLite
3
  → Build priority-tiered XML snapshot (≤2 KB)
4
  → Store snapshot in session_resume table
5

6
SessionStart fires (source: "compact")
7
  → Retrieve stored snapshot
8
  → Write structured events file → auto-indexed into FTS5
9
  → Build Session Guide with 15 categories
10
  → Inject <session_knowledge> directive into context

The snapshot is built in priority tiers—if the 2 KB budget is tight, lower-priority events are dropped first while critical state is always preserved.

Session Guide Categories

After compaction, the model receives a structured narrative:

Last Request — user’s last prompt
Tasks — checkbox format with completion status
Key Decisions — user corrections and preferences
Files Modified — all files touched during session
Unresolved Errors — errors that haven’t been fixed
Git — operations performed
Project Rules — CLAUDE.md / GEMINI.md paths
MCP Tools Used — tool names with call counts

Platform Compatibility

Platform	MCP Server	Hooks	Session Continuity
Claude Code	✓	✓ (full)	Full
Gemini CLI / Qwen CLI	✓	✓	High
OpenCode	✓	Plugin	High
Cline	✓	—	—
Zed Editor	✓	—	—

Installation

Claude Code (recommended):

1
/plugin marketplace add mksglu/context-mode
2
/plugin install context-mode@context-mode

Restart Claude Code (or run /reload-plugins).

Verify:

1
/context-mode:ctx-doctor

All checks should show [x].

OpenCode

Add to opencode.json in your project root (or ~/.config/opencode/opencode.json for global):

1
{
2
  "$schema": "https://opencode.ai/config.json",
3
  "mcp": {
4
    "context-mode": {
5
      "type": "local",
6
      "command": ["context-mode"]
7
    }
8
  },
9
  "plugin": ["context-mode"]
10
}

The mcp entry registers the 6 sandbox tools. The plugin entry enables hooks—OpenCode calls the plugin’s TypeScript functions directly before and after each tool execution, blocking dangerous commands and enforcing sandbox routing.

Gemini CLI / Qwen CLI

Both share the same config format. Add to ~/.gemini/settings.json:

1
{
2
  "mcpServers": {
3
    "context-mode": {
4
      "command": "context-mode"
5
    }
6
  },
7
  "hooks": {
8
    "BeforeTool": [
9
      {
10
        "matcher": "run_shell_command|read_file|read_many_files|grep_search|search_file_content|web_fetch|activate_skill|mcp__plugin_context-mode",
11
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli beforetool" }]
12
      }
13
    ],
14
    "AfterTool": [
15
      {
16
        "matcher": "",
17
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli aftertool" }]
18
      }
19
    ],
20
    "PreCompress": [
21
      {
22
        "matcher": "",
23
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli precompress" }]
24
      }
25
    ],
26
    "SessionStart": [
27
      {
28
        "matcher": "",
29
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli sessionstart" }]
30
      }
31
    ]
32
  }
33
}

Restart Gemini CLI / Qwen CLI after editing.

Verify:

1
/mcp list

You should see context-mode: ... - Connected.

Cline (VS Code Extension)

Open VS Code Settings (File > Preferences > Settings)
Search for “cline.mcpServers”
Add to your settings.json:

1
{
2
  "cline.mcpServers": [
3
    {
4
      "name": "context-mode",
5
      "command": "context-mode"
6
    }
7
  ]
8
}

Reload VS Code window

Cline doesn’t support hooks, The sandbox tools work for context savings, but session continuity requires manual management.

Zed Editor

Add to ~/.config/zed/settings.json (Windows: %APPDATA%\Zed\settings.json):

1
{
2
  "context_servers": {
3
    "context-mode": {
4
      "command": {
5
        "path": "context-mode"
6
      }
7
    }
8
  }
9
}

Copy routing instructions to project root:

1
mkdir -p node_modules/context-mode
2
cp -r node_modules/context-mode/configs/zed/AGENTS.md ./AGENTS.md

Or create manually:

1
# Context Mode Routing
2

3
Use `ctx_execute`, `ctx_execute_file`, or `ctx_batch_execute` instead of raw shell commands when output may be large.
4

5
Use `ctx_index` + `ctx_search` for instead of reading large files directly.
6

7
Use `ctx_fetch_and_index` instead of `web_fetch` for large responses.

Restart Zed (or save settings.json—Zed auto-restarts context servers)

Zed has no hook support. The routing instructions file enforces ~60% compliance.

Utility Commands

Inside any AI session, just type the command. The LLM calls the MCP tool automatically:

1
ctx stats    # Context savings, call counts, session report
2
ctx doctor   # Diagnose runtimes, hooks, FTS5, versions
3
ctx upgrade  # Update from GitHub, rebuild, reconfigure

From your terminal—run directly without an AI session:

1
context-mode doctor
2
context-mode upgrade

Benchmarks

Scenario	Raw	Context	Saved
Playwright snapshot	56.2 KB	299 B	99%
GitHub Issues (20)	58.9 KB	1.1 KB	98%
Access log (500 requests)	45.1 KB	155 B	100%
Analytics CSV (500 rows)	85.5 KB	222 B	100%
Git log (153 commits)	11.6 KB	107 B	99%
Repo research (subagent)	986 KB	62 KB	94%

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time extends from ~30 minutes to ~3 hours.

Security

Context Mode enforces the same permission rules you already use—but extends them to the MCP sandbox. If you block sudo, it’s also blocked inside sandbox tools.

1
{
2
  "permissions": {
3
    "deny": [
4
      "Bash(sudo *)",
5
      "Bash(rm -rf /*)",
6
      "Read(**/.env*)"
7
    ]
8
  }
9
}

Conclusion

Context Mode solves both halves of the context problem:

Context Saving — Sandbox tools keep raw data out of the context window (98% reduction)
Session Continuity — Events are tracked in SQLite and restored after compaction

The goal isn’t just saving money on API costs—it’s maintaining model intelligence. When you clear noise from the context window, you leave more room for actual reasoning.

If you’re building complex projects with AI agents, Context Mode offers a practical solution to extend productive coding sessions significantly.

This post generated by opencode with GLM-5 from Z.AI Coding Plan, based on content from: https://www.youtube.com/watch?v=QUHrntlfPo4 and https://github.com/mksglu/context-mode