Context Mode: The MCP Server That Solves Claude Code's Context Bloat

The Context Bloat Problem

Model Context Protocol (MCP) tool calls return full output directly into the model’s context window. With 81+ tools active, up to 143K tokens (72% of a 200K window) can be consumed before the first user message. After 30 minutes of typical agent usage, 40% of available context is depleted. When the agent compacts to free space, it loses track of files, tasks, and decisions.

Operation	Context Cost
Playwright snapshot	~56 KB
20 GitHub issues	~59 KB
Access log (500 requests)	~45 KB
Analytics CSV (500 rows)	~85 KB

If you perform these operations multiple times during a planning phase, you’ve eaten 70% of your context window before the agent has written a single line of code.

Architecture

Context Mode operates as an MCP server using stdio transport. Raw data stays in a sandboxed subprocess and never enters your context window.

1
AI Agent (200K context) → Context Mode MCP → Isolated Subprocess → Summary only

Key architectural principles:

Privacy-first: No telemetry, no cloud sync, no usage tracking, no account required
Local storage: SQLite databases live in your home directory
Process isolation: Each execution call spawns an isolated subprocess with its own process boundary. Scripts cannot access each other’s memory or state
Credential passthrough: Authenticated CLIs (gh, aws, gcloud, kubectl, docker) inherit environment variables and config paths without exposing them to context

The Sandbox Tools

Context Mode provides 9 tools — 6 sandbox tools and 3 utilities:

Core Sandbox Tools

Tool	Function	Context Reduction
`ctx_batch_execute`	Run multiple commands + search queries in ONE call	986 KB → 62 KB (94%)
`ctx_execute`	Run code in 11 languages, only stdout enters context	56 KB → 299 B (99%)
`ctx_execute_file`	Process files in sandbox, raw content never leaves	45 KB → 155 B (100%)
`ctx_index`	Chunk markdown into FTS5 with BM25 ranking	60 KB → 40 B
`ctx_search`	Query indexed content with multiple queries	On-demand retrieval
`ctx_fetch_and_index`	Fetch URL, detect content type (HTML/JSON/text), chunk and index	60 KB → 40 B

Utility Tools

Tool	Function
`ctx_stats`	Context savings breakdown, call counts, session statistics
`ctx_doctor`	Diagnose: runtimes, hooks, FTS5, versions
`ctx_upgrade`	Update from GitHub, rebuild, reconfigure hooks

How the Sandbox Works

Scripts run in isolated subprocesses. The subprocess captures stdout, and only that enters the conversation context. Raw data — log files, API responses, snapshots — never leaves the sandbox.

Supported runtimes: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R, and Elixir. Bun is auto-detected for 3-5x faster JS/TS execution.

Intent-driven filtering: When output exceeds 5 KB and an intent is provided, Context Mode:

Indexes full output into FTS5 knowledge base
Searches for sections matching the intent
Returns only relevant matches
Provides vocabulary of searchable terms for follow-up queries

The Knowledge Base: FTS5 + BM25

The ctx_index tool chunks markdown content by headings while preserving code blocks, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table.

Storage

BM25 ranking scores documents based on:

Term frequency
Inverse document frequency
Document length normalization

Porter stemming is applied at index time so “running”, “runs”, and “ran” match the same stem. Titles and headings are weighted 5x in BM25 scoring.

Three-Layer Fuzzy Search

Layer	Algorithm	Example
1	Porter stemming (FTS5 MATCH)	“caching” → “cached”
2	Trigram substring (FTS5 trigram)	“useEff” → “useEffect”
3	Levenshtein fuzzy correction	”kuberntes” → “kubernetes”

Layer 1 is tried first. If no results, Layer 2 activates. If still no results, Layer 3 corrects typos and retries.

Smart Snippets

Instead of truncating to first N characters, Context Mode finds where query terms appear and returns windows around those matches with heading context, preserving code blocks intact.

Progressive Throttling

Prevents the model from exhausting the knowledge base in a single turn:

Calls 1-3: Normal results (2 per query)
Calls 4-8: Reduced results (1 per query) + warning
Calls 9+: Blocked — redirects to ctx_batch_execute

Source Scoping

Search can target a specific indexed source, so React docs don’t interfere with API references.

Session Continuity

When the context window fills up, the agent compacts the conversation — dropping older messages. Without session tracking, the model forgets which files it was editing, what tasks are in progress, and what errors were resolved.

Event Capture

Context Mode captures 15 event categories in per-project SQLite:

Category	Events	Priority
Files	read, edit, write, glob, grep	P1 (Critical)
Tasks	create, update, complete	P1
Rules	CLAUDE.md / GEMINI.md / AGENTS.md paths + content	P1
User Prompts	Every user message (for last-prompt restore)	P1
Decisions	User corrections (“use X instead”, “don’t do Y”)	P2
Git	checkout, commit, merge, rebase, push, pull, diff, status	P2
Errors	Tool failures, non-zero exit codes	P2
Environment	cwd changes, venv, nvm, conda, package installs	P2
MCP Tools	All `mcp__*` calls with usage counts	P3
Subagents	Agent tool invocations	P3
Skills	Slash command invocations	P3

How Sessions Survive Compaction

1
PreCompact fires
2
  → Read all session events from SQLite
3
  → Build priority-tiered XML snapshot (≤2 KB)
4
  → Store snapshot in session_resume table
5

6
SessionStart fires (source: "compact")
7
  → Retrieve stored snapshot
8
  → Write structured events file → auto-indexed into FTS5
9
  → Build Session Guide with 15 categories
10
  → Inject <session_knowledge> directive into context

Lower-priority events (intent, MCP tool counts) are dropped first if the 2 KB budget is tight. Critical state (files, tasks, rules, decisions) is always preserved.

Session Guide Categories

After compaction, the model receives a structured narrative:

Last Request — user’s last prompt, so the model continues without asking “what were we doing?”
Tasks — checkbox format with completion status ([x] completed, [ ] pending)
Key Decisions — user corrections and preferences
Files Modified — all files touched during session
Unresolved Errors — errors that haven’t been fixed
Git — operations performed (checkout, commit, push, status)
Project Rules — CLAUDE.md / GEMINI.md / AGENTS.md paths
MCP Tools Used — tool names with call counts
Subagent Tasks — delegated work summaries
Skills Used — slash commands invoked
Environment — working directory, env variables
Data References — large data pasted during session
Session Intent — mode classification (implement, investigate, review)
User Role — behavioral directives set during session

Platform Compatibility

Feature	Claude Code	Gemini CLI	VS Code Copilot	Cursor	OpenCode	Codex CLI
MCP Server	✓	✓	✓	✓	✓	✓
PreToolUse Hook	✓	✓	✓	✓	Plugin	—
PostToolUse Hook	✓	✓	✓	✓	Plugin	—
SessionStart Hook	✓	✓	✓	—	—	—
PreCompact Hook	✓	✓	✓	—	Plugin	—
Can Block Tools	✓	✓	✓	✓	Plugin	—
Slash Commands	✓	—	—	—	—	—
Session Completeness	Full	High	High	Partial	High	—

Routing Enforcement

Hook enforcement is critical: One unrouted Playwright snapshot (56 KB) can wipe an entire session’s savings. Without hooks, compliance is ~60%. With hooks, ~98%.

Platform	Hooks Available	With Hooks	Without Hooks
Claude Code	Auto	~98%	~60%
Gemini CLI	Yes	~98%	~60%
VS Code Copilot	Yes	~98%	~60%
OpenCode	Plugin	~98%	~60%
Codex CLI	No	—	~60%

Installation

Claude Code (recommended)

1
/plugin marketplace add mksglu/context-mode
2
/plugin install context-mode@context-mode

Restart Claude Code. Includes: MCP server, hooks (PreToolUse, PostToolUse, PreCompact, SessionStart), CLAUDE.md routing, slash commands (/ctx-stats, /ctx-doctor, /ctx-upgrade).

Verify: /context-mode:ctx-doctor — all checks should show [x].

OpenCode

Add to opencode.json in your project root:

1
{
2
  "$schema": "https://opencode.ai/config.json",
3
  "mcp": {
4
    "context-mode": {
5
      "type": "local",
6
      "command": ["context-mode"]
7
    }
8
  },
9
  "plugin": ["context-mode"]
10
}

The plugin entry enables TypeScript hooks via tool.execute.before, tool.execute.after, and experimental.session.compacting.

Gemini CLI

Add to ~/.gemini/settings.json:

1
{
2
  "mcpServers": {
3
    "context-mode": {
4
      "command": "context-mode"
5
    }
6
  },
7
  "hooks": {
8
    "BeforeTool": [
9
      {
10
        "matcher": "run_shell_command|read_file|read_many_files|grep_search|search_file_content|web_fetch|activate_skill|mcp__plugin_context-mode",
11
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli beforetool" }]
12
      }
13
    ],
14
    "AfterTool": [
15
      {
16
        "matcher": "",
17
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli aftertool" }]
18
      }
19
    ],
20
    "PreCompress": [
21
      {
22
        "matcher": "",
23
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli precompress" }]
24
      }
25
    ],
26
    "SessionStart": [
27
      {
28
        "matcher": "",
29
        "hooks": [{ "type": "command", "command": "context-mode hook gemini-cli sessionstart" }]
30
      }
31
    ]
32
  }
33
}

Restart Gemini CLI after editing. Verify: /mcp list — you should see context-mode: ... - Connected.

VS Code Copilot

Create .vscode/mcp.json in your project root:

1
{
2
  "servers": {
3
    "context-mode": {
4
      "command": "context-mode"
5
    }
6
  }
7
}

Create .github/hooks/context-mode.json for hooks:

1
{
2
  "hooks": {
3
    "PreToolUse": [
4
      { "type": "command", "command": "context-mode hook vscode-copilot pretooluse" }
5
    ],
6
    "PostToolUse": [
7
      { "type": "command", "command": "context-mode hook vscode-copilot posttooluse" }
8
    ],
9
    "SessionStart": [
10
      { "type": "command", "command": "context-mode hook vscode-copilot sessionstart" }
11
    ]
12
  }
13
}

Restart VS Code.

Cline (VS Code Extension)

Open VS Code Settings > search “cline.mcpServers”
Add to settings.json:

1
{
2
  "cline.mcpServers": [
3
    {
4
      "name": "context-mode",
5
      "command": "context-mode"
6
    }
7
  ]
8
}

Cline doesn’t support hooks. Sandbox tools work for context savings, but session continuity requires manual management.

Utility Commands

Inside any AI session, just type the command. The LLM calls the MCP tool automatically:

1
ctx stats    # Context savings, call counts, session report
2
ctx doctor   # Diagnose runtimes, hooks, FTS5, versions
3
ctx upgrade  # Update from GitHub, rebuild, reconfigure

From your terminal — run directly without an AI session:

1
context-mode doctor
2
context-mode upgrade

Benchmarks

From official benchmarks with real tool outputs:

Scenario	Raw	Context	Saved
Playwright snapshot	56.2 KB	299 B	99%
GitHub Issues (20)	58.9 KB	1.1 KB	98%
Access log (500 requests)	45.1 KB	155 B	100%
Context7 React docs	5.9 KB	261 B	96%
Analytics CSV (500 rows)	85.5 KB	222 B	100%
Git log (153 commits)	11.6 KB	107 B	99%
Test output (30 suites)	6.0 KB	337 B	95%
Repo research (subagent)	986 KB	62 KB	94%

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time extends from ~30 minutes to ~3 hours.

Security

Context Mode enforces the same permission rules you already use — but extends them to the MCP sandbox. Zero additional configuration required.

1
{
2
  "permissions": {
3
    "deny": [
4
      "Bash(sudo *)",
5
      "Bash(rm -rf /*)",
6
      "Read(.env)",
7
      "Read(**/.env*)"
8
    ],
9
    "allow": [
10
      "Bash(git:*)",
11
      "Bash(npm:*)"
12
    ]
13
  }
14
}

deny always wins over allow
Chained commands (&&, ;, |) are split and checked separately
Project-level rules override global configuration
Works across all platforms (reads Claude Code settings format)

License

Elastic License 2.0 (ELv2) — free to use, modify, and share. May not rebrand and redistribute as a competing plugin, product, or managed service.

Conclusion

Context Mode solves both halves of the context problem:

Context Saving — Sandbox tools keep raw data out of the context window (98% reduction)
Session Continuity — Events are tracked in SQLite and restored after compaction

The goal isn’t just saving money on API costs — it’s maintaining model intelligence. When you clear noise from the context window, you leave more room for actual reasoning.

References

Context Mode Official Site — https://context-mode.mksg.lu/
Context Mode GitHub Repository — https://github.com/mksglu/context-mode
Context Mode Benchmark Results — https://github.com/mksglu/context-mode/blob/main/BENCHMARK.md

This article was written by opencode (GLM-5), based on content from: https://www.youtube.com/watch?v=QUHrntlfPo4, https://context-mode.mksg.lu/, and https://github.com/mksglu/context-mode