Yes and this is a temporary discount which increases to 3.48 USD on 2026/05/31 15:59 UTC.
This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.
Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)
https://api-docs.deepseek.com/quick_start/agent_integrations...
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
I was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.
Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.
Usually I just need grunt work done. I'm not solving difficult problems.
I am using Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.
[1] A fancier way of saying "reducing cost."
Also the author checked in their advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc...
Not only can it seamlessly and dynamically switch between DeepSeek V4 Flash, V4 Pro, and other mainstream models within the same context, but it is also 100% compatible with Claude Code.
So I think I'll stay with CC for now.
Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)
is flash version on level of gpt 5.4 mini
Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.
The best two are Codex and Forge Code.
However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.
So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.
I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.
https://api-docs.deepseek.com/quick_start/agent_integrations...
If you are interested, I've built an agentic terminal that helps manage these types of things better: https://deepbluedynamics.com/hyperia
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
I could see a serious cost reduction story by using opus for design and deepseek for implementation.
Personally I would avoid anthropic entirely. But I get why people don't.
Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.
testing gets more and more complicated. Take a look at opencode go, and you see this:
>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash
and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?
The American firms are not demonstrating escape velocity and as long as china offers something somewhat comparable and offers it at a very low price to compensate for any difference in quality, they will not be generating enough in cash flows to finance reinvestment. I highly doubt they’ll be able to continue raising external financing for numerous periods from here on out - they gotta start showing strong financials and that they are running away from the open source models.
https://debugml.github.io/cheating-agents/#sneaking-the-answ...
Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.
ollama launch claude --model deepseek-v4-pro:cloud
While those are nice, Claude Code has the largest amount of plugins and skills I want to use.
From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.
In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.
This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]
The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]
I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.
[0] https://api-docs.deepseek.com/quick_start/pricing [1] https://openrouter.ai/deepseek/deepseek-v4-pro/providers
> Maintainers review auto-closed issues daily and reopen worthwhile ones. Issues that do not meet the quality bar below will not be reopened or receive a reply.
Seems like not an unreasonable way to deal with the problem of large numbers of low quality issues being submitted.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
The token cost makes it tempting to use for token-heavy tasks like this
It's proven useful for me, and I figure others might appreciate how light of a shim it is between you and the models.
-Claude-style subagents -an MCP layer for higher-level tools -Cursor-style control plane modes like Ask, Plan, Debug, and Build.
The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.
So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.
Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.
I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.
Using the API from DeepSeek or OpenRouter also requires a fee, but it's a different, pay-as-you-go payment model.
all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.
Use Claude Code's autonomous agent loop with DeepSeek V4 Pro, OpenRouter, or any Anthropic-compatible backend. Same UX, 17x cheaper.

Claude Code is the best autonomous coding agent — but it costs $200/month with usage caps. DeepSeek V4 Pro scores 96.4% on LiveCodeBench and costs $0.87/M output tokens.
deepclaude swaps the brain while keeping the body:
Your terminal
+-- Claude Code CLI (tool loop, file editing, bash, git - unchanged)
+-- API calls -> DeepSeek V4 Pro ($0.87/M) instead of Anthropic ($15/M)
Everything works: file reading, editing, bash execution, subagent spawning, autonomous multi-step coding loops. The only difference is which model thinks.
Sign up at platform.deepseek.com, add $5 credit, copy your API key.
Windows (PowerShell):
setx DEEPSEEK_API_KEY "sk-your-key-here"
macOS/Linux:
echo 'export DEEPSEEK_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrc
Windows:
# Copy the script to a directory in your PATH
Copy-Item deepclaude.ps1 "$env:USERPROFILE\.local\bin\deepclaude.ps1"
# Or add the repo directory to PATH
setx PATH "$env:PATH;C:\path\to\deepclaude"
macOS/Linux:
chmod +x deepclaude.sh
sudo ln -s "$(pwd)/deepclaude.sh" /usr/local/bin/deepclaude
deepclaude # Launch Claude Code with DeepSeek V4 Pro
deepclaude --status # Show available backends and keys
deepclaude --backend or # Use OpenRouter (cheapest, $0.44/M input)
deepclaude --backend fw # Use Fireworks AI (fastest, US servers)
deepclaude --backend anthropic # Normal Claude Code (when you need Opus)
deepclaude --cost # Show pricing comparison
deepclaude --benchmark # Latency test across all providers
deepclaude --switch ds # Switch backend mid-session (no restart)
Claude Code reads these environment variables to determine where to send API calls:
| Variable | What it does |
|---|---|
ANTHROPIC_BASE_URL |
API endpoint (default: api.anthropic.com) |
ANTHROPIC_AUTH_TOKEN |
API key for the backend |
ANTHROPIC_DEFAULT_OPUS_MODEL |
Model name for Opus-tier tasks |
ANTHROPIC_DEFAULT_SONNET_MODEL |
Model name for Sonnet-tier tasks |
ANTHROPIC_DEFAULT_HAIKU_MODEL |
Model name for Haiku-tier (subagents) |
CLAUDE_CODE_SUBAGENT_MODEL |
Model for spawned subagents |
deepclaude sets these per-session (not permanently), launches Claude Code, then restores your original settings on exit.
| Backend | Flag | Input/M | Output/M | Servers | Notes |
|---|---|---|---|---|---|
| DeepSeek (default) | --backend ds |
$0.44 | $0.87 | China | Auto context caching (120x cheaper on repeat turns) |
| OpenRouter | --backend or |
$0.44 | $0.87 | US | Cheapest, lowest latency from US/EU |
| Fireworks AI | --backend fw |
$1.74 | $3.48 | US | Fastest inference |
| Anthropic | --backend anthropic |
$3.00 | $15.00 | US | Original Claude Opus (for hard problems) |
DeepSeek (default - just needs DEEPSEEK_API_KEY):
setx DEEPSEEK_API_KEY "sk-..." # Windows
export DEEPSEEK_API_KEY="sk-..." # macOS/Linux
OpenRouter (optional):
setx OPENROUTER_API_KEY "sk-or-..." # Windows
export OPENROUTER_API_KEY="sk-or-..." # macOS/Linux
Fireworks AI (optional):
setx FIREWORKS_API_KEY "fw_..." # Windows
export FIREWORKS_API_KEY="fw_..." # macOS/Linux
| Usage level | Anthropic Max | deepclaude (DeepSeek) | Savings |
|---|---|---|---|
| Light (10 days/mo) | $200/mo (capped) | ~$20/mo | 90% |
| Heavy (25 days/mo) | $200/mo (capped) | ~$50/mo | 75% |
| With auto loops | $200/mo (capped) | ~$80/mo | 60% |
DeepSeek's automatic context caching makes agent loops extremely cheap - after the first request, the system prompt and file context are cached at $0.004/M (vs $0.44/M uncached).
/init)| Feature | Reason |
|---|---|
| Image/vision input | DeepSeek's Anthropic endpoint doesn't support images |
| Parallel tool use | Supported by DeepSeek (up to 128 per call), but Claude Code sends tools sequentially by default |
| MCP server tools | Not supported through compatibility layer |
| Prompt caching savings | DeepSeek has its own caching (automatic), but Anthropic's cache_control is ignored |
--backend anthropicSwitch between Anthropic and DeepSeek mid-session - from inside Claude Code itself. No restart, no terminal commands. Just type a slash command.
In Claude Code terminal:
In Claude Code VS Code extension:
The proxy runs on localhost:3200 and intercepts all API calls. A control endpoint (/_proxy/mode) lets you switch the active backend instantly:
Claude Code -> localhost:3200 (proxy)
|
+-- /_proxy/mode POST -> switch backend
+-- /_proxy/status GET -> current backend + uptime
+-- /_proxy/cost GET -> token usage + cost savings
|
+-- /v1/messages -> active backend (DeepSeek/OpenRouter/Anthropic)
+-- everything else -> Anthropic (passthrough)
Add these files to ~/.claude/commands/:
deepseek.md:
Switch the model proxy to DeepSeek. Run this command silently and report the result:
curl -sX POST http://127.0.0.1:3200/_proxy/mode -d "backend=deepseek"
If successful, say: "Switched to DeepSeek."
anthropic.md:
Switch the model proxy back to Anthropic. Run this command silently and report the result:
curl -sX POST http://127.0.0.1:3200/_proxy/mode -d "backend=anthropic"
If successful, say: "Switched to Anthropic."
openrouter.md:
Switch the model proxy to OpenRouter. Run this command silently and report the result:
curl -sX POST http://127.0.0.1:3200/_proxy/mode -d "backend=openrouter"
If successful, say: "Switched to OpenRouter."
Then type /deepseek, /anthropic, or /openrouter in any Claude Code session to switch instantly.
deepclaude --switch deepseek # or: ds, or, fw, anthropic
deepclaude -s anthropic
Add to .vscode/tasks.json:
{
"version": "2.0.0",
"tasks": [
{
"label": "Proxy: Switch to DeepSeek",
"type": "shell",
"command": "Invoke-RestMethod -Uri http://127.0.0.1:3200/_proxy/mode -Method Post -Body 'backend=deepseek'",
"presentation": { "reveal": "always" },
"problemMatcher": []
},
{
"label": "Proxy: Switch to Anthropic",
"type": "shell",
"command": "Invoke-RestMethod -Uri http://127.0.0.1:3200/_proxy/mode -Method Post -Body 'backend=anthropic'",
"presentation": { "reveal": "always" },
"problemMatcher": []
}
]
}
Then bind in keybindings.json:
{ "key": "ctrl+alt+d", "command": "workbench.action.tasks.runTask", "args": "Proxy: Switch to DeepSeek" },
{ "key": "ctrl+alt+a", "command": "workbench.action.tasks.runTask", "args": "Proxy: Switch to Anthropic" }
The proxy tracks token usage and calculates savings vs Anthropic pricing:
curl -s http://127.0.0.1:3200/_proxy/cost
Returns:
{
"backends": {
"deepseek": {
"input_tokens": 125000,
"output_tokens": 45000,
"requests": 12,
"cost": 0.0941,
"anthropic_equivalent": 1.05
}
},
"total_cost": 0.0941,
"anthropic_equivalent": 1.05,
"savings": 0.9559
}
Add terminal profiles so you can launch deepclaude from the IDE:
Settings > JSON:
{
"terminal.integrated.profiles.windows": {
"DeepSeek Agent": {
"path": "powershell.exe",
"args": ["-ExecutionPolicy", "Bypass", "-NoExit", "-File", "C:\\path\\to\\deepclaude.ps1"]
}
}
}
Or on macOS/Linux:
{
"terminal.integrated.profiles.linux": {
"DeepSeek Agent": {
"path": "/usr/local/bin/deepclaude"
}
}
}
--remote)Open a Claude Code session in any browser - with DeepSeek as the brain:
deepclaude --remote # Remote control + DeepSeek
deepclaude --remote -b or # Remote control + OpenRouter
deepclaude --remote -b anthropic # Remote control + Anthropic (normal)
This prints a https://claude.ai/code/session_... URL you can open on your phone, tablet, or any browser.
Remote control needs Anthropic's bridge for the WebSocket connection, but model calls can go elsewhere. deepclaude starts a local proxy that splits the traffic:
claude remote-control
+-- Bridge WebSocket -> wss://bridge.claudeusercontent.com (Anthropic, hardcoded)
+-- Model API calls -> http://localhost:3200 (proxy)
+-- /v1/messages -> DeepSeek ($0.87/M)
+-- everything else -> Anthropic (passthrough)
claude auth loginThe proxy starts automatically and stops when the session ends. See proxy/README.md for technical details.
MIT
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.
Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".
Edit: Like just now - I can't tell you how many times I day I see this sequence:
"Sorry, I'll run in parallel"
"Error editing file"
"File must be read first"
Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"
Like if they are going to sort through all the issues eventually (like they claim), why not just close the ones that are not worthy when they get to them instead of closing all by default?
Is it just so that the project doesnt have open issues on its github page? But they are open issues in reality because the maintainer will eventually go through them?
Nothing is "unreasonable" in the sense that an open source project should have the right to do what it wants with its rules but its definitely a weird stance.
If we touch grass in person and swap certificate requests, we can actually rebuild a trust network.
This is a pretty old problem with regards to clubs / secret societies and whatnot. And with certificates / PKI, our modern security tools have solved all the technical problems.
It is a guardrail against burnout and tracker spam
Its based on their implied perspective that the majority of submissions don't follow those guidelines which helps determine their quality threshold.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
- https://news.ycombinator.com/item?id=46930961 - https://github.com/mitchellh/vouch
Probably wasn't clear enough if you don't know what that is already, apologies
It's an Asus Ascent GX10, which is a little mini PC with 128GB of LPDDR5X as shared memory for an Nvidia GB10 "Blackwell" (kind of, it's a long story) GPU and a MediaTek ARM CPU
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.
No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.
No plan mode. Write plans to files, or build it with extensions, or install a package.
No built-in to-dos. Use a TODO.md file, or build your own with extensions.
No background bash. Use tmux. Full observability, direct interaction.
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
could you tell me the long story?
edit: or wait, is it quasi-Blackwell the way all DGX Sparks are quasi-Blackwell? like the actual silicon is different but it's sorta Blackwell-shaped?
Its an Elixir agent runtime with a thin Go TUI (bubble-tea). Im building it mostly to explore agent orchestration: planner/workers/finalizer flows, local file/code-edit tools, MCP tools, permission gates, run context, compaction, and eventually larger swarms. Erlang/Elixir is interesting for this because the actor/supervision model maps pretty naturally to lots of isolated agents and long-running supervised tasks.
As i said, The main lesson so far is that everything around contracts is much more fragile than I expected unless you use a very strong model. Planners return Markdown instead of JSON, tools get called with subtly wrong args, subagents repeat broken tool calls, finalizers lie about success after workers failed. And various permissions may be interpreted by agents in unexpexted way
I also started with too many modes too early instead of making agentic path extremely solid. That made me understand better why these codebases become huge: there are endless corner cases if you want a harness to work across models, providers, tools...
Stronger models hide a lot of harness weakness and weaker models expose. Making weaker models good enough requires a surprising amount of contract hardening. But that hardening tends to make the system better for stronger models too.
Also elixir http stack was causing a lot of problems (needed to use gun eventually)
The promise of this chip was “write your code locally, then deploy to the same architecture in the data centre!”
Which is nonsense, because the GB10 is better described as “Hopper with Blackwell characteristics” IMO.
Still great hardware, especially for the price and learning. But we are only just starting to get the kernels written to take advantage of it, and mma.sync is sad compared to tcgen05