I see some of these AI companies adopting some of these ideas sooner or later. Trim the tokens locally to save on token usage.
It strikes me there's more low hanging fruit to pluck re. context window management. Backtracking strikes me as another promising direction to avoid context bloat and compaction (i.e. when a model takes a few attempts to do the right thing, once it's done the right thing, prune the failed attempts out of the context).
The core idea: every MCP tool call dumps raw data into your 200K context window. Context Mode spawns isolated subprocesses — only stdout enters context. No LLM calls, purely algorithmic: SQLite FTS5 with BM25 ranking and Porter stemming.
Since the last post we've seen 228 stars and some real-world usage data. The biggest surprise was how much subagent routing matters — auto-upgrading Bash subagents to general-purpose so they can use batch_execute instead of flooding context with raw output.
Source: https://github.com/mksglu/claude-context-mode Happy to answer any architecture questions.
The subprocess isolation is smart - stdout-only is the right constraint. I've been running multi-agent workflows where the cost of tool output accumulation forces you to make bad decisions: either summarise outputs manually (defeating the purpose of tool calls), truncate logs (information loss), or cap the workflow depth. None of them good.
The search ranking piece is worth noting. Most people just grep logs or dump chunks and let the LLM sort it out. BM25 + FTS5 means you're pre-filtering at index time, not letting the model do relevance ranking on the full noise. That's the difference between usable and unusable context at scale.
Only question: how does credential passthrough work with MCP's protocol boundaries? If gh/aws/gcloud run in the subprocess, how does the auth state persist between tool calls, or does each call reinit?
In connecting the dots (and help me make sure I'm connecting them correctly), context-mode _does not address MCP context usage at all_, correct? You are instead suggesting we refactor or eliminate MCP tools, or apply concepts similar to context_mode in our MCPs where possible?
Context-mode is still very high value, even if the answer is "no," just want to make sure I understand. Also interested in your thoughts about the above.
I write a number of MCPs that work across all Claude surfaces; so the usual "CLI!" isn't as viable an answer (though with code execution it sometimes can be) ...
Edit: typo
I think agents should manage their own context too. For example, if you’re working with a tool that dumps a lot of logged information into context, those logs should get pruned out after one or two more prompts.
Context should be thought of something that can be freely manipulated, rather than a stack that can only have things appended or removed from the end.
It’s interesting to imagine a single model deciding to wipe its own memory though, and roll back in time to a past version of itself (only, with the answer to a vexing problem)
I've set up a hook that blocks directly running certain common tools and instead tells Claude to pipe the output to a temporary file and search that for relevant info. There's still some noise where it tries to run the tool once, gets blocked, then runs it the right way. But it's better than before.
I could see this working like some sort of undo tree, with multiple branches you can jump back and forth between.
There's some challenges around the LLM having enough output tokens to easily specify what it wants its next input tokens to be, but "snips" should be able to be expressed concisely (i.e. the next input should include everything sent previously except the chunk that starts XXX and ends YYY). The upside is tighter context, the downside is it'll bust the prompt cache (perhaps the optimal trade-off is to batch the snips).
Every MCP tool call in Claude Code dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log — 45 KB. After 30 minutes, 40% of your context is gone.
Context Mode is an MCP server that sits between Claude Code and these outputs. 315 KB becomes 5.4 KB. 98% reduction.
MCP became the standard way for AI agents to use external tools. But there's a tension at its core: every tool interaction fills the context window from both sides — definitions on the way in, raw output on the way out.
With 81+ tools active, 143K tokens (72%) get consumed before your first message. Then the tools start returning data. A single Playwright snapshot burns 56 KB. A gh issue list dumps 59 KB. Run a test suite, read a log file, fetch documentation — each response eats into what remains.
Cloudflare showed that tool definitions can be compressed by 99.9% with Code Mode. We asked: what about the other direction?
Each execute call spawns an isolated subprocess with its own process boundary. Scripts can't access each other's memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. The raw data — log files, API responses, snapshots — never leaves the sandbox.
Ten language runtimes are available: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R. Bun is auto-detected for 3-5x faster JS/TS execution.
Authenticated CLIs (gh, aws, gcloud, kubectl, docker) work through credential passthrough — the subprocess inherits environment variables and config paths without exposing them to the conversation.
The index tool chunks markdown content by headings while keeping code blocks intact, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table. Search uses BM25 ranking — a probabilistic relevance algorithm that scores documents based on term frequency, inverse document frequency, and document length normalization. Porter stemming is applied at index time so "running", "runs", and "ran" match the same stem.
When you call search, it returns exact code blocks with their heading hierarchy — not summaries, not approximations, the actual indexed content. fetch_and_index extends this to URLs: fetch, convert HTML to markdown, chunk, index. The raw page never enters context.
Validated across 11 real-world scenarios — test triage, TypeScript error diagnosis, git diff review, dependency audit, API response processing, CSV analytics. All under 1 KB output each.
Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.
Two ways. Plugin Marketplace gives you auto-routing hooks and slash commands:
/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode
Or MCP-only if you just want the tools:
claude mcp add context-mode -- npx -y context-mode
Restart Claude Code. Done.
You don't change how you work. Context Mode includes a PreToolUse hook that automatically routes tool outputs through the sandbox. Subagents learn to use batch_execute as their primary tool. Bash subagents get upgraded to general-purpose so they can access MCP tools.
The practical difference: your context window stops filling up. Sessions that used to hit the wall at 30 minutes now run for 3 hours. The same 200K tokens, used more carefully.
I run the MCP Directory & Hub. 100K+ daily requests. See every MCP server that ships. The pattern was clear: everyone builds tools that dump raw data into context. Nobody was solving the output side.
Cloudflare's Code Mode blog post crystallized it. They compressed tool definitions. We compress tool outputs. Same principle, other direction.
Built it for my own Claude Code sessions first. Noticed I could work 6x longer before context degradation. Open-sourced it.
Open source. MIT. github.com/mksglu/claude-context-mode
Mert Köseoğlu, Senior Software Engineer, AI consultant. x.com/mksglu · linkedin.com/in/mksglu · mksg.lu
Note: you’re replying to the library’s author.
author reply: not as obvious, but for one thing yes literally em dash, their post has 10 em dashes in 748 words, this comment has 2 em dashes in 115 words. Not that em dash = ai, but in the context of a post about AI it seems more likely. And finally, https://github.com/mksglu/claude-context-mode/blob/main/cont... the file the author linked in their own repo does not exist!
(https://github.com/mksglu/claude-context-mode/blob/main/src/... exists but they messed up the link?)