> Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools [...] In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.
Em dash and "it's not X, it's Y" in one sentence. Tired of reading posts written by AI. Feels disrespectful to your readers
I've found myself doing similar workarounds. I'm guessing anthropic will just make the /compact command do this instead soon enough.
Most of the time I'm just pasting code blocks directly into raycast and once I've fixed the bug or got the properly transformed code in the shape that I aimed for, then I paste it back into neovim. Next i'm going to try out "opencode"[0], because I've heard some good things about it. For now, I'm happy with my current workflow.
read the document at https://blog.sshh.io/p/how-i-use-every-claude-code-feature and tell me how to improve my Claude code setup
Are the CLI-based agents better (much better?) than the Cursor app? Why?
I like how easy it is to get Cursor to focus a particular piece of code. I select the text and Cmd-L, saying "fix this part, it's broken like this ____."
I haven't really tried a CLI agent; sending snippets of code by CLI sounds really annoying. "Fix login.ts lines 148-160, it's broken like this ___"
Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.
By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.
1. OpenSkills -https://github.com/BandarLabs/open-skills
I have started experimenting with a skills/ directory in my open source software, and then made a plugin marketplace that just pulls them in. It works well, but I don't know how scalable it will be.
If no anonymous access is provided, is there a way to create an account with a noscript/basic (x)html/classic web browsers in order to get an API key secret?
Because I do not use web engines from the "whatng" cartel.
To add insult to injury, my email is self-hosted with IP literals to avoid funding the DNS people which are mostly now in strong partnership with the "whatng" cartel (email with IP literals are "stronger" than SPF since it does the same and more). An email is often required for account registration.
But in general I still don’t really use MCP. Agents are just so good at solving problems themselves. I wish MCP would mostly focus at the auth part instead of the tool part. Getting an agent access to an API with credentials usually gives them enough power to solve problems on their own.
[1]: https://x.com/mitsuhiko/status/1984756813850374578?s=46
The people who just copy paste output from ai and ship it as a blog post however, deserve significant condemnation for that.
Didn’t realize you were forced to read this?
> Feels disrespectful to your readers
I didn’t feel disrespected—I felt so respected I read the whole thing.
Really, the interface isn't a meaningful part of it. I also like cmd-L, but claude just does better at writing code.
...also, it's nice that Anthropic is just focusing on making cool stuff (like skills), while the folk from cursor are... I dunno. Whatever it is they're doing with cursor 2.0 :shrug:
My concern with hardcoding paths inside a doc, it will likely become outdated as the codebase evolves.
One solution would be to script it and have it run pre commit to regenerate the Claude.md with the new paths.
There probably is potential for even more dev tooling that 1. Ensure reference paths are always correct, 2. Enforces standard for how references are documented in Claude.md (and lints things like length)
Perhaps using some kind of inline documentation standard like jsdoc if it’s a ts file or a naming convention if it’s an Md file
Example:
// @claude.md // For complex … usage or if you encounter a FooBarError, see ${path} for advanced troubleshooting steps
I researched this the other day, the recommended (by Anthropic) way to do this is to have a CLAUDE.md with a single line in it:
@AGENTS.md
Then keep your actual content in the other file: https://docs.claude.com/en/docs/claude-code/claude-code-on-t...Right now these are reading like a guide to prolog in the 1980s.
This feels like a false economy to me for real sized changes, but maybe I’m just a weak code reviewer. For code I really don’t care about, I’m happy to do this, but if I ever need to understand that code I have an uphill battle. OTOH reading intermediate diffs and treating the process like actual pair programming has worked well for me, left me with changes I’m happy with, and codebases I understand well enough to debug.
I recommend using it directly instead of via the plugin
There is no customer advantage to developing cheap and fast if the delivered product isn't well conceived from a current and future customer-needs perspective, and a quickly shipped product full of bugs isn't going to help anyone.
I think the same goes for AI in general - CEOs are salivating over adopting "AI" (which people like Altman and Amodei are telling them will be human level tomorrow, or yesterday in the case of Amodei), and using it to reduce employee head count, but the technology is nowhere near the human level needed to actually benefit customers. An "AI" (i.e. LLM) customer service agent/chatbot is just going to piss off customers.
Latest version from 2 momths ago, >4700 stars on GitHub
I use Claude Code. A lot.
As a hobbyist, I run it in a VM several times a week on side projects, often with --dangerously-skip-permissions to vibe code whatever idea is on my mind. Professionally, part of my team builds the AI-IDE rules and tooling for our engineering team that consumes several billion tokens per month just for codegen.
The CLI agent space is getting crowded and between Claude Code, Gemini CLI, Cursor, and Codex CLI, it feels like the real race is between Anthropic and OpenAI. But TBH when I talk to other developers, their choice often comes down to what feels like superficials—a “lucky” feature implementation or a system prompt “vibe” they just prefer. At this point these tools are all pretty good. I also feel like folks often also over index on the output style or UI. Like to me the “you’re absolutely right!” sycophancy isn’t a notable bug; it’s a signal that you’re too in-the-loop. Generally my goal is to “shoot and forget”—to delegate, set the context, and let it work. Judging the tool by the final PR and not how it gets there.
Having stuck to Claude Code for the last few months, this post is my set of reflections on Claude Code’s entire ecosystem. We’ll cover nearly every feature I use (and, just as importantly, the ones I don’t), from the foundational CLAUDE.md file and custom slash commands to the powerful world of Subagents, Hooks, and GitHub Actions. This post ended up a bit long and I’d recommend it as more of a reference than something to read in entirety.
The single most important file in your codebase for using Claude Code effectively is the root CLAUDE.md. This file is the agent’s “constitution,” its primary source of truth for how your specific repository works.
How you treat this file depends on the context. For my hobby projects, I let Claude dump whatever it wants in there.
For my professional work, our monorepo’s CLAUDE.md is strictly maintained and currently sits at 13KB (I could easily see it growing to 25KB).
It only documents tools and APIs used by 30% (arbitrary) or more of our engineers (else tools are documented in product or library specific markdown files)
We’ve even started allocating effectively a max token count for each internal tool’s documentation, almost like selling “ad space” to teams. If you can’t explain your tool concisely, it’s not ready for the CLAUDE.md.
Over time, we’ve developed a strong, opinionated philosophy for writing an effective CLAUDE.md.
Start with Guardrails, Not a Manual. Your CLAUDE.md should start small, documenting based on what Claude is getting wrong.
Don’t @-File Docs. If you have extensive documentation elsewhere, it’s tempting to @-mention those files in your CLAUDE.md. This bloats the context window by embedding the entire file on every run. But if you just mention the path, Claude will often ignore it. You have to pitch the agent on why and when to read the file. “For complex … usage or if you encounter a FooBarError, see path/to/docs.md for advanced troubleshooting steps.”
Don’t Just Say “Never.” Avoid negative-only constraints like “Never use the --foo-bar flag.” The agent will get stuck when it thinks it must use that flag. Always provide an alternative.
Use CLAUDE.md as a Forcing Function. If your CLI commands are complex and verbose, don’t write paragraphs of documentation to explain them. That’s patching a human problem. Instead, write a simple bash wrapper with a clear, intuitive API and document that. Keeping your CLAUDE.md as short as possible is a fantastic forcing function for simplifying your codebase and internal tooling.
Here’s a simplified snapshot:
# Monorepo
## Python
- Always ...
- Test with <command>
... 10 more ...
## <Internal CLI Tool>
... 10 bullets, focused on the 80% of use cases ...
- <usage example>
- Always ...
- Never <x>, prefer <Y>
For <complex usage> or <error> see path/to/<tool>_docs.md
...
Finally, we keep this file synced with an AGENTS.md file to maintain compatibility with other AI IDEs that our engineers might be using.
If you are looking for more tips for writing markdown for coding agents see “AI Can’t Read Your Docs”, “AI-powered Software Engineering”, and “How Cursor (AI IDE) Works”.
The Takeaway: Treat your CLAUDE.md as a high-level, curated set of guardrails and pointers. Use it to guide where you need to invest in more AI (and human) friendly tools, rather than trying to make it a comprehensive manual.
I recommend running /context mid coding session at least once to understand how you are using your 200k token context window (even with Sonnet-1M, I don’t trust that the full context window is actually used effectively). For us a fresh session in our monorepo costs a baseline ~20k tokens (10%) with the remaining 180k for making your change — which can fill up quite fast.
[

A screenshot of /context in one of my recent side projects. You can almost think of this like disk space that fills up as you work on a feature. After a few minutes or hours you’ll need to clear the messages (purple) to make space to continue.
I have three main workflows:
/compact (Avoid): I avoid this as much as possible. The automatic compaction is opaque, error-prone, and not well-optimized.
/clear + /catchup (Simple Restart): My default reboot. I /clear the state, then run a custom /catchup command to make Claude read all changed files in my git branch.
“Document & Clear” (Complex Restart): For large tasks. I have Claude dump its plan and progress into a .md, /clear the state, then start a new session by telling it to read the .md and continue.
The Takeaway: Don’t trust auto-compaction. Use /clear for simple reboots and the “Document & Clear” method to create durable, external “memory” for complex tasks.
I think of slash commands as simple shortcuts for frequently used prompts, nothing more. My setup is minimal:
/catchup: The command I mentioned earlier. It just prompts Claude to read all changed files in my current git branch.
/pr: A simple helper to clean up my code, stage it, and prepare a pull request.
IMHO if you have a long list of complex, custom slash commands, you’ve created an anti-pattern. To me the entire point of an agent like Claude is that you can type almost whatever you want and get a useful, mergable result. The moment you force an engineer (or non-engineer) to learn a new, documented-somewhere list of essential magic commands just to get work done, you’ve failed.
The Takeaway: Use slash commands as simple, personal shortcuts, not as a replacement for building a more intuitive CLAUDE.md and better-tooled agent.
On paper, custom subagents are Claude Code’s most powerful feature for context management. The pitch is simple: a complex task requires X tokens of input context (e.g., how to run tests), accumulates Y tokens of working context, and produces a Z token answer. Running N tasks means (X + Y + Z) * N tokens in your main window.
The subagent solution is to farm out the (X + Y) * N work to specialized agents, which only return the final Z token answers, keeping your main context clean.
I find they are a powerful idea that, in practice, custom subagents create two new problems:
They Gatekeep Context: If I make a PythonTests subagent, I’ve now hidden all testing context from my main agent. It can no longer reason holistically about a change. It’s now forced to invoke the subagent just to know how to validate its own code.
They Force Human Workflows: Worse, they force Claude into a rigid, human-defined workflow. I’m now dictating how it must delegate, which is the very problem I’m trying to get the agent to solve for me.
My preferred alternative is to use Claude’s built-in Task(...) feature to spawn clones of the general agent.
I put all my key context in the CLAUDE.md. Then, I let the main agent decide when and how to delegate work to copies of itself. This gives me all the context-saving benefits of subagents without the drawbacks. The agent manages its own orchestration dynamically.
In my “Building Multi-Agent Systems (Part 2)” post, I called this the “Master-Clone” architecture, and I strongly prefer it over the “Lead-Specialist” model that custom subagents encourage.
The Takeaway: Custom subagents are a brittle solution. Give your main agent the context (in CLAUDE.md) and let it use its own Task/Explore(...) feature to manage delegation.
On a simple level, I use claude --resume and claude --continue frequently. They’re great for restarting a bugged terminal or quickly rebooting an older session. I’ll often claude --resume a session from days ago just to ask the agent to summarize how it overcame a specific error, which I then use to improve our CLAUDE.md and internal tooling.
More in the weeds, Claude Code stores all session history in ~/.claude/projects/ to tap into the raw historical session data. I have scripts that run meta-analysis on these logs, looking for common exceptions, permission requests, and error patterns to help improve agent-facing context.
The Takeaway: Use claude --resume and claude --continue to restart sessions and uncover buried historical context.
Hooks are huge. I don’t use them for hobby projects, but they are critical for steering Claude in a complex enterprise repo. They are the deterministic “must-do” rules that complement the “should-do” suggestions in CLAUDE.md.
We use two types:
Block-at-Submit Hooks: This is our primary strategy. We have a PreToolUse hook that wraps any Bash(git commit) command. It checks for a /tmp/agent-pre-commit-pass file, which our test script only creates if all tests pass. If the file is missing, the hook blocks the commit, forcing Claude into a “test-and-fix” loop until the build is green.
Hint Hooks: These are simple, non-blocking hooks that provide “fire-and-forget” feedback if the agent is doing something suboptimal.
We intentionally do not use “block-at-write” hooks (e.g., on Edit or Write). Blocking an agent mid-plan confuses or even “frustrates” it. It’s far more effective to let it finish its work and then check the final, completed result at the commit stage.
The Takeaway: Use hooks to enforce state validation at commit time (block-at-submit). Avoid blocking at write time—let the agent finish its plan, then check the final result.
Planning is essential for any “large” feature change with an AI IDE.
For my hobby projects, I exclusively use the built-in planning mode. It’s a way to align with Claude before it starts, defining both how to build something and the “inspection checkpoints” where it needs to stop and show me its work. Using this regularly builds a strong intuition for what minimal context is needed to get a good plan without Claude botching the implementation.
In our work monorepo, we’ve started rolling out a custom planning tool built on the Claude Code SDK. Its similar to native plan mode but heavily prompted to align its outputs with our existing technical design format. It also enforces our internal best practices—from code structure to data privacy and security—out of the box. This lets our engineers “vibe plan” a new feature as if they were a senior architect (or at least that’s the pitch).
The Takeaway: Always use the built-in planning mode for complex changes to align on a plan before the agent starts working.
I agree with Simon Willison’s: Skills are (maybe) a bigger deal than MCP.
If you’ve been following my posts, you’ll know I’ve drifted away from MCP for most dev workflows, preferring to build simple CLIs instead (as I argued in “AI Can’t Read Your Docs”). My mental model for agent autonomy has evolved into three stages:
Single Prompt: Giving the agent all context in one massive prompt. (Brittle, doesn’t scale).
Tool Calling: The “classic” agent model. We hand-craft tools and abstract away reality for the agent. (Better, but creates new abstractions and context bottlenecks).
Scripting: We give the agent access to the raw environment—binaries, scripts, and docs—and it writes code on the fly to interact with them.
With this model in mind, Agent Skills are the obvious next feature. They are the formal productization of the “Scripting” layer.
If, like me, you’ve already been favoring CLIs over MCP, you’ve been implicitly getting the benefit of Skills all along. The SKILL.md file is just a more organized, shareable, and discoverable way to document these CLIs and scripts and expose them to the agent.
The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents.
Skills don’t mean MCP is dead (see also “Everything Wrong with MCP”). Previously, many built awful, context-heavy MCPs with dozens of tools that just mirrored a REST API (read_thing_a(), read_thing_b(), update_thing_c()).
The “Scripting” model (now formalized by Skills) is better, but it needs a secure way to access the environment. This to me is the new, more focused role for MCP.
Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools:
download_raw_data(filters…)
take_sensitive_gated_action(args…)
execute_code_in_environment_with_state(code…)
In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way. It provides the entry point for the agent, which then uses its scripting and markdown context to do the actual work.
The only MCP I still use is for Playwright, which makes sense—it’s a complex, stateful environment. All my stateless tools (like Jira, AWS, GitHub) have been migrated to simple CLIs.
The Takeaway: Use MCPs that act as data gateways. Give the agent one or two high-level tools (like a raw data dump API) that it can then script against.
Claude Code isn’t just an interactive CLI; it’s also a powerful SDK for building entirely new agents—for both coding and non-coding tasks. I’ve started using it as my default agent framework over tools like LangChain/CrewAI for most new hobby projects.
I use it in three main ways:
Massive Parallel Scripting: For large-scale refactors, bug fixes, or migrations, I don’t use the interactive chat. I write simple bash scripts that call claude -p “in /pathA change all refs from foo to bar” in parallel. This is far more scalable and controllable than trying to get the main agent to manage dozens of subagent tasks.
Building Internal Chat Tools: The SDK is perfect for wrapping complex processes in a simple chat interface for non-technical users. Like an installer that, on error, falls back to the Claude Code SDK to just fix the problem for the user. Or an in-house “v0-at-home” tool that lets our design team vibe-code mock frontends in our in-house UI framework, ensuring their ideas are high-fidelity and the code is more directly usable in frontend production code.
Rapid Agent Prototyping: This is my most common use. It’s not just for coding. If I have an idea for any agentic task (e.g., a “threat investigation agent” that uses custom CLIs or MCPs), I use the Claude Code SDK to quickly build and test the prototype before committing to a full, deployed scaffolding.
The Takeaway: The Claude Code SDK is a powerful, general-purpose agent framework. Use it for batch-processing code, building internal tools, and rapidly prototyping new agents before you reach for more complex frameworks.
The Claude Code GitHub Action (GHA) is probably one of my favorite and most slept on features. It’s a simple concept: just run Claude Code in a GHA. But this simplicity is what makes it so powerful.
It’s similar to Cursor’s background agents or the Codex managed web UI but is far more customizable. You control the entire container and environment, giving you more access to data and, crucially, much stronger sandboxing and audit controls than any other product provides. Plus, it supports all the advanced features like Hooks and MCP.
We’ve used it to build custom “PR-from-anywhere” tooling. Users can trigger a PR from Slack, Jira, or even a CloudWatch alert, and the GHA will fix the bug or add the feature and return a fully tested PR1.
Since the GHA logs are the full agent logs, we have an ops process to regularly review these logs at a company level for common mistakes, bash errors, or unaligned engineering practices. This creates a data-driven flywheel: Bugs -> Improved CLAUDE.md / CLIs -> Better Agent.
$ query-claude-gha-logs --since 5d | claude -p “see what the other claudes were getting stuck on and fix it, then put up a PR“
The Takeaway: The GHA is the ultimate way to operationalize Claude Code. It turns it from a personal tool into a core, auditable, and self-improving part of your engineering system.
Finally, I have a few specific settings.json configurations that I’ve found essential for both hobby and professional work.
HTTPS_PROXY/HTTP_PROXY: This is great for debugging. I’ll use it to inspect the raw traffic to see exactly what prompts Claude is sending. For background agents, it’s also a powerful tool for fine-grained network sandboxing.
MCP_TOOL_TIMEOUT/BASH_MAX_TIMEOUT_MS: I bump these. I like running long, complex commands, and the default timeouts are often too conservative. I’m honestly not sure if this is still needed now that bash background tasks are a thing, but I keep it just in case.
ANTHROPIC_API_KEY: At work, we use our enterprise API keys (via apiKeyHelper). It shifts us from a “per-seat” license to “usage-based” pricing, which is a much better model for how we work.
It accounts for the massive variance in developer usage (We’ve seen 1:100x differences between engineers).
It lets engineers to tinker with non-Claude-Code LLM scripts, all under our single enterprise account.
“permissions”: I’ll occasionally self-audit the list of commands I’ve allowed Claude to auto-run.
The Takeaway: Your settings.json is a powerful place for advanced customization.
That was a lot, but hopefully, you find it useful. If you’re not already using a CLI-based agent like Claude Code or Codex CLI, you probably should be. There are rarely good guides for these advanced features, so the only way to learn is to dive in.
To me, a fairly interesting philosophical question is how many reviewers should a PR get that was generated directly from a customer request (no internal human prompter)? We’ve settled on 2 human approvals for any AI-initiated PR for now, but it is kind of a weird paradigm shift (for me at least) when it’s no longer a human making something for another human to review.
You’ll also end up dealing with merge conflicts if you haven’t carefully split the work or modularized the code.
I think if it literally as a collection of .md files and scripts to help perform some set of actions. I'm excited for it not really as a "new thing" (as mentioned in the post) but as effectively an endorsement for this pattern of agent-data interaction.
It wasn't possible before for me to do any of this at this kind of scale. Before, getting stuck on a bug could mean hours, days, or maybe even weeks of debugging. I never made the kind of progress I wanted before.
Many of the things I want, do already exist, but are often older, not as efficient or flexible as they could be, or just plain _look_ dated.
But now I can pump out react/shadcn frontends easily, generate apis, and get going relatively quickly. It's still not pure magic. I'm still hitting issues and such, but they are not these demotivating, project-ending, roadblocks anymore.
I can now move at a speed that matches the ideas I have.
I am giving up something to achieve that, by allowing AI to take control so much, but it's a trade that seems worth it.
So if youre building your own agent, this would be a directory of markdown documents with headers that you tell the agent to scan so that its aware of them, and then if it thinks they could be useful it can choose to read all the instructions into its context? Is it any more than that?
I guess I dont understand how this isnt just RAG with an index you make the agent aware of?
So now you need to get CC to understand _how_ to do that for various tools in a way that's context efficient, because otherwise you're relying on either potentially outdated knowledge that Claude has built in (leading to errors b/c CC doesn't know about recent versions) or chucking the entirety of a man page into your default context (inefficent).
What the Skill files do is then separate the when from the how.
Consider the git cli.
The skill file has a couple of sentences on when to use the git cli and then a much longer section on how it's supposed to be used, and the "how" section isn't loaded until you actually need it.
I've got skills for stuff like invoking the native screenshot CLI tool on the Mac, for calling a custom shell script that uses the github API to download and pull in screenshots from issues (b/c the cli doesn't know how to do this), for accessing separate APIs for data, etc.
Our auth, log diving, infra state, etc, is all usable via cli, and it feels pretty good when pointing Claude at it.
Part of it is the snappy more minimal UX but also just pure efficacy seems consistently better. Claude does its best work in CC. I'm sure the same is true of Codex.
Codex writes higher quality code, but is slower and less feature rich. I imagine this will change within months. The jury is still out. Exciting times!
It's much easier to review larger changes when you've aligned on a Claude generated plan up front.
I use AI for code, but I never use it for any writing that is for human eyes.
The skills that I use all direct a next action and how to do it. Most of them instruct to use Tasks to isolate context. Some of them provide abstraction specific context (when working with framework code, find all consumers before making changes. add integration tests for the desired state if it’s missing, then run tests to see…) and others just inject only the correct company specific approach to solving only this problem into Task context.
They are composable and you can build the logic table of when an instance is “skilled” enough. I found them worse than hooks with subagents when I started, but now I see them as the coolest thing in Claude code.
The last benefit is nobody on your team even had to know they exist. You can just have them as part of onboarding and everyone can take advantage of what you’ve learned even when working on greenfield projects that don’t have a CLAUDE.md.
You can do anything you want via a CLI but MCP still exists as a standard that folks and platforms might want to adopt as a common interface.
It is why I am a bit puzzled by the people who use an LLM to generate code in anything other than a "tightly scoped" fashion (boilerplate, throwaway code, standalone script, single file, or at the function level). I'm not sure how that makes your job later on any easier if you have even a worse mental model of the code because you didn't even write it. And debugging is almost usually more tedious than writing code, so you've traded off the fun/easy part for a more difficult one. Seems like a faustian deal.
A sibling comment on hooks mentions some approaches. You could also try leveraging the UserPromptSubmit hook to do some prompt analysis and force relevant skill activation.
Skills doesn't totally deprecate documenting things in CLAUDE.md but agree that a lot of these can be defined as skills instead.
Skill frontmatter also still sits in the global context so it's not really a token optimization either.
This is basically a "thinking tax".
If you don't want to think and offload it to llm they burn through a lot of tokens to implement in a non-efficient way something you could often do in 10 lines if you though about it for a few minutes.
The agentic part of the equation is improving on both sides all the time.
For example, if you're writing a command line tool in Python, it doesn't really matter what model you use since they're all really great at Python (LOL). However, if you're writing a complicated SPA that uses say, Vue 3 with Vite (and some fancy CSS framework) and Python w/FastAPI... You want the "smartest" model that knows about all these things at once (and regularly gets updated knowledge of the latest versions of things). For me, that means Claude Code.
I am cheap though and only pay Anthropic $20/month. This means I run out of Claude Credits every damned week (haha). To work around this problem, I used to use OpenAI's pay-per-use API with gpt5-mini with VS Code's Copilot extension, switching to GPT5-codex (medium) with the Codex extension for more complicated tasks.
Now that I've got more experience, I've figured out that GPT5-codex costs way too much (in API credits) for what you get in nearly all situations. Seriously: Why TF does it use that much "usage". Anyway...
I've tried them all with my very, very complicated collaborative editor (CRDTs), specifically to learn how to better use AI coding assistants. So here's what I do now:
* Ollama cloud for gpt-oss:120b (it's so fast!)
* Claude Code for everything else.
I cannot understate how impressed I am with gpt-oss:120b... It's like 10x faster than gpt5-mini and yet seems to perform just as well. Maybe better, actually because it forces you to narrow your prompts (due to smaller context window). But because it's just so damned fast, that doesn't matter.With Claude Code, it's like magic: You give it a really f'ing complicated thing to troubleshoot or implement and it just goes—and keeps going until it finishes or you run out of tokens! It's a, "the future is now!" experience for sure.
With gpt-oss:120b it's more like having an actual conversation, where the only time you stop typing is when you're reviewing what it did (which you have to do for all the models... Some more than others).
FYI: The worst is Gemini 2.5. I wouldn't even bother! It's such trash, I can't even fathom how Google is trying to pass it off as anything more than a toy. When it decides to actually run (as opposed to responding with, "Failed... Try again"), it'll either hallucinate things that have absolutely nothing to do with your prompt or it'll behave like some petulant middle school kid that pretend to spend a lot of time thinking about something but ultimately does nothing at all.
If theres enough interest, I might replicate some examples in an open source project.
At the moment though I also code on and off with an agent. I’m not ready or willing to only vibe code my projects. For one is the fact that I had tons of examples where the agent gaslighted me only to turn around at the last stage. And in some cases the code output was to result focused and didn’t think about the broader general usage. And sure that’s in part because I hold it wrong. Don’t specify 10million markdown files etc. But it’s a feedback loop system. If I don’t trust the results I don’t jump in deeper. And I feel a lot of developers have no issue with jumping ever deeper. Write MCPs now CLIs and describe projects with custom markdown files. But I think we really need both camps. Otherwise we don’t move forward.
If you are using literally any of Claude Code’s features the experience isn’t close, and regardless of model preference (Claude is my least favorite model by far) you should probably use Claude code. It’s just a much more extensible product for teams.
Before LLMs we simply wouldn't implement many of those features since they were not exactly critical and required a lot of time, but now when the required development time is cut signifficantly, they suddenly make sense to implement.
> the author clearly read through, organized, and edited the output.
Also worth noting, I've read plenty of human written stuff that has errors in it, so I read everything skeptically anyway.
There's even an official Anthropic VS Code extension to run CC in VS Code. The biggest advantage is being able to use VS Code's diff views, which I like more than in the terminal. But the VS Code CC extension doesn't support all the latest features of the terminal CC, so I'm usually still in the terminal.
The recommended approach has the advantage of separating information specific to Claude Code, but I think that in the long run, Anthropic will have to adopt the AGENTS.md format
Also, when using separate files, memories will be written to CLAUDE.md, and periodic triaging will be required: deciding what to leave there and what to move to AGENTS.md
I suggest everyone who can to try the voice mode. https://getvoicemode.com/
IMO the best advice in life is try not to be fearful of things that happen to everyone and you can't change.
Good news! What you are afraid of will happen, but it'll happen to everyone all at once, and nothing you can do can change it.
So you no longer need to feel fear. You can skip right on over to resignation. (We have cookies, for we are cooked)
Losing access to GPT 5 Pro is also a big hit… it is by far the best for reading full files/repos and creating plans (though it also by far has the worst out of the box tooling)
Whereas I tried Kilo Code and CoPilot and JetBrain's agent and others direct against Sonnet 4 and the output was ... not good ... in comparison.
I have my criticisms of Claude but still find it very impressive.
If you tell me I didn’t really need a LLM to be able to do all that in a week and just some thought and 10 lines of code would do, I suspect you are not really familiar with the latest developments in AI and just vastly underestimates the capabilities they have to do tricky stuff.
But I can’t speak to it working across OS.
Maybe CC users haven’t figured out how to parallelize their work because it’s fast enough to just wait or be distracted, and so the Codex waiting seems unbearable.
Anthropic say "put @AGENTS.md in your CLAUDE.md" file and my own experiments confirmed that this dumps the content into the system prompt in the same way as if you had copied it to CLAUDE.md manually, so I'm happy with that solution - at least until Anthropic give in and support AGENTS.md directly.
To see if it is easy to digest, no repeated code etc or is it just slop that should be consumed by another agent and never by human.
Who gives a shit?
If you can’t stand AI writing and you made it pretty far along before getting upset, who are you mad at, the author or yourself? Would you be happier if you found out this was written without AI and that you were just bad at detecting AI writing?
Only sane (guaranteed portable) option is for it to be a relative symlink to another file within the same repo, of course. i.e. CLAUDE.md would be -> 'AGENTS.md', not '/home/simonw/projects/pelicans-on-bicycles/AGENTS.md' or whatever.
My current project I have a top level chat , then one chat in each of the four component sub directories.
I have a second terminal with QA-feature
So 10 tabs total . Plus I have one to run occasional commands real quick (like docker ps).
I’m using qwen.
Thats why it took a week with llm. And for you it makes sense as this is new tech.
But if someone knows those technologies - it would still take a week with llm and like 2 days without.
GPT5-codex (medium) is such a token hog for some reason
I thought git by default treats symlinks simply as file copies when cloning new.
Ie git may not be aware of the symlink.
Code is no different! You can tell an AI model to write something for you and that's fine! Except you have to review it! If the code is bad quality just take a moment to tell the AI to fix it!
Like, how hard is it to tell the AI that the code it just wrote is too terse and hard to read? Come on, folks! Take that extra moment! I mean, I'm pretty lazy when working on my hobby projects but even I'm going to get irritated if the code is a gigantic mess.
Just tell it, "this code is a gigantic mess. Refactor it into concise, human-readable files using a logical structure and make sure to add plenty of explanatory comments where anything might be non-obvious. Make sure that the code passes all the tests when you're done."
I think we'll be dealing with slop issues for quite some time, but I also have hopes that AI will raise the bar of code in general.