MCP Server Is Eating Your Context Window. There's a Simpler Way

I'm getting tired of everyone saying "MCP is dead, use CLIs!".

Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.

The problem with tossing it out entirely is that it leaves a lot more questions for handling security.

When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.

MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.

Doing the same with skills is not possible in a programatic and deterministic way.

There needs to be a middle ground instead of throwing out MCP entirely.

The industry is talking in circles here. All you need is "composability".

UNIX solved this with files and pipes for data, and processes for compute.

AI agents are solving this this with sub-agents for data, and "code execution" for compute.

The UNIX approach is both technically correct and elegant, and what I strongly favor too.

The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.

Source: building an small business agent at https://housecat.com/.

We do have APIs wrapped in MCP. But we only give the agent BASH, an CLI wrapper for the MCPs, and the ability to write code, and works great.

"It's a UNIX system! I know this!"

CLIs are great for some applications! But 'progressive disclosure' means more mistakes to be corrected and more round trips to the model - every time[1] you use the tool in a new thread. You're trading latency for lower cost/more free context. That might be great! But it might not be, and the opposite trade (more money/less context for lower latency) makes a lot of sense for some applications. esp. if the 'more money' part can be amortized over lots of users by keeping the tool definitions block cached.

[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.

10 years from now: "Can you believe they did anything with such a small context window?"

One of the MCP Core Maintainers here, so take this with a boulder of salt if you're skeptical of my biases.

The debate around "MCP vs. CLI" is somewhat pointless to me personally. Use whatever gets the job done. MCP is much more than just tool calling - it also happens to provide a set of consistent rails for an agent to follow. Besides, we as developers often forget that the things we build are also consumed by non-technical folks - I have no desire to teach my parents to install random CLIs to get things done instead of plugging a URI to a hosted MCP server with a well-defined impact radius. The entire security posture of "Install this CLI with access to everything on your box" terrifies me.

The context window argument is also an agent harness challenge more than anything else - modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation. At this point it's just a trope that is repeated from blog post to blog post. This blog post too alludes to this and talks about the need for infrastructure to make it work, but it just isn't the case. It's a pattern that's being adopted broadly as we speak.

The trend is obviously towards larger and larger context windows. We moved from 200K to 1M tokens being standard just this year.

This might be a complete non issue in 6 months.

There's already an open source tool that does exactly the same thing: https://github.com/knowsuchagency/mcp2cli

The thing with CLIs is that you also need to return results efficiently. It if both MCP and CLI return results efficiently, CLI wins

We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.

The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.

The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).

Let me guess - another article about how CLI s are superior to MCP?

At this point, I feel like MCP servers are just not feasible at the current level of context windows and LLMs. Good idea, but we're way too early.