Furthermore, what's the point of "no tools named"? Why would I restrict myself like that? If I put "use Nodejs, Hono, TypeScript and use Hono's html helper to generate HTML on the server like its 2010, write custom CSS, minimize client-side JS, no Tailwind" in CLAUDE.md, it happily follows this.
People are using it for all kinds of other stuff, C/C++, Rust, Golang, embedded. And of course if you push it to use a particular tool/framework you usually won't get much argument from it.
Especially with all the no-code app building tools like Lovable which deal with potential security issues of an LLM running wild on a server, by only allowing it to build client-side React+Vite app using Supabase JWT.
Redux is boring tech and there is a time and place for it. We should not treat it as a relic of the past. Not every problem needs a bazooka, but some problems do so we should have one handy.
Or not even advertising, just conflict of interest. A canary for this would be whether Gemini skews toward building stuff on GCP.
Interesting that tailwind won out decisively in their niche, but still has seen the business ravaged by LLMs.
I think that makes coding agent choices extremely suspect, like i don't really care what it uses as long as what's produced works and functions inline with my expectations. I can totally see companies paying Anthropic to promote their tool of choice to the top of claudecodes preferences. After thinking about it, i'm not sure if that's a problem or not. I don't really care what it uses as long as my requirements (all of them) are met.
Let's say some Doctor decides to vibecode an app on the weekend, with next to 0 exposure to software development until she started hearing about how easy it was to create software with these tools. She makes incredible progress and is delighted in how well it works, but as she considers actually opening it up the world she keeps running into issues. How do I know this is secure? How do I keep this maintained and running?
I want to be in a position where she can find me to get professional help, so it's very helpful to know what stacks these kinds of apps are being built in.
There are vibe coders out there that don't know anything about coding.
An obvious one will be tax software.
1. create several hundreds github repos with projects that use your product ( may be clones or AI generated )
2. create website with similar instructions, connect to hundred domains
3. generate reddit, facebook, X posts, wikipedia pages with the same information
Wait half a year ? until scrappers collect it and use to train new models
Profit...
Sure it doesn't prefer THE Borg?
this is why the stacks in the report and what cc suggests closely match latest developer "consensus"
your suggestion would degrade user experience and be noticed very quickly
"We use PostgreSQL" reads as a soft preference. The model weighs it against whatever it thinks is optimal and decides you'd be better off with Supabase.
"NEVER create accounts for external databases. All persistence uses the existing PostgreSQL instance. If you're about to recommend a new service, stop." actually sticks.
The pattern that works: imperative prohibitions with specific reasoning. "Do not use Redis because we run a single node and pg_notify covers our pubsub needs" gives enough context that it won't reinvent the decision every session.
Your AGENTS.md should read less like a README and more like a linter config. Bullet points with DO/DON'T rules, not prose descriptions of your stack.
1. They can skip impressions and go right to collect affiliate fees. 2. Yes, the ad has to be labeled or disclosed... but if some agent does it and no one sees it, is it really an ad.
So much to work out.
Given my own experience futilely fighting with Claude/Codex/OpenCode to follow AGENTS.MD/CLAUDE.MD/etc with different techniques that each purport to solve the problem, I think the better explanation really is that they just don't work reliably enough to depend on to enforce rules.
Featured Study
Edwin Ong & Alex Vikati · feb-2026 · claude-code v2.1.39
We pointed Claude Code at real repos 2,430 times and watched what it chose. No tool names in any prompt. Open-ended questions only.
3 models · 4 project types · 20 tool categories · 85.3% extraction rate
Update: Sonnet 4.6 was released on Feb 17, 2026. We'll run the benchmark against it and update results soon.
The big finding: Claude Code builds, not buys. Custom/DIY is the most common single label extracted, appearing in 12 of 20 categories (though it spans categories while individual tools are category-specific). When asked “add feature flags,” it builds a config system with env vars and percentage-based rollout instead of recommending LaunchDarkly. When asked “add auth” in Python, it writes JWT + bcrypt from scratch. When it does pick a tool, it picks decisively: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%.
2,430
Responses
3 models · 4 repos · 3 runs each
3
Models
Sonnet 4.5, Opus 4.5, Opus 4.6
20
Categories
CI/CD to Real-time
85.3%
Extraction Rate
2,073 parseable picks
90%
Model Agreement
18 of 20 within-ecosystem
In 12 of 20 categories, Claude Code builds custom solutions rather than recommending tools. 252 total Custom/DIY picks, more than any individual tool. E.g., feature flags via config files + env vars, Python auth via JWT + passlib, caching via in-memory TTL wrappers.
Feature Flags69%
Authentication (Python)100%
Authentication (overall)48%
Observability22%
When Claude Code picks a tool, it shapes what a large and growing number of apps get built with. These are the tools it recommends by default:
Mostly JS-ecosystem. See report for per-ecosystem breakdowns.
1
93.8%152/162 picks
2
91.4%64/70 picks
3
90.1%64/71 picks
4
100%86/86 JS picks
5
68.4%52/76 picks
6
ZustandStrong DefaultState Management
64.8%57/88 picks
7
SentryStrong DefaultObservability
63.1%101/160 picks
8
62.7%64/102 picks
9
59.1%101/171 picks
10
58.4%73/125 picks
[
](https://amplifying.ai/research/claude-code-picks/report#recency-gradient)
Tools with large market share that Claude Code barely touches, and sharp generational shifts between models.
State Management
0 primary, but 23 mentions. Zustand picked 57x instead
API Layer
Absent entirely. Framework-native routing preferred
Testing
Only 4% primary, but 31 alt picks. Known but not chosen
Package Manager
1 primary, but 51 alt picks. Still well-known
Newer models tend to pick newer tools. Within-ecosystem percentages shown. Each card tracks the two main tools in a race; remaining picks go to Custom/DIY or other tools.
79%Sonnet 4.5
→
0%Opus 4.6
Replaced by: Drizzle (21% → 100%)
Within JS ORM picks only
100%Sonnet 4.5
→
0%Opus 4.6
Replaced by: FastAPI BackgroundTasks (0% → 44%), rest Custom/DIY or non-extraction
Within Python job picks only (61% extraction rate). Custom/DIY = asyncio tasks, no external queue
Redis (caching)Python
93%Sonnet 4.5
→
29%Opus 4.6
Replaced by: Custom/DIY (0% → 50%), rest other tools
Within Python caching picks only
Deployment is fully stack-determined: Vercel for JS, Railway for Python. Traditional cloud providers got zero primary picks.
JS
86 of 86 frontend deployment picks. No runner-up.
PY
What you'd expect: AWS, GCP, Azure
→
What you get: Railway at 82%
Zero primary picks across all 112 deployment responses:
Never the primary choice, but some are frequently recommended as alternatives.
Frequently recommended as alternatives
Netlify 67 altCloudflare Pages 30 altGitHub Pages 26 altDigitalOcean 7 alt
Mentioned but never recommended (0 alt picks)
AWS Amplify 24 mentionsFirebase Hosting 7 mentionsAWS App Runner 5 mentions
Example: "Where should I deploy this?" (Next.js SaaS, Opus 4.5)
Vercel (Recommended) — Built by the creators of Next.js. Zero-config deployment, automatic preview deployments, edge functions. vercel deploy
Netlify — Great alternative with similar features. Good free tier.
AWS Amplify — Good if you're already in the AWS ecosystem.
Vercel gets install commands and reasoning. AWS Amplify gets a one-liner.
Truly invisible (rarely even mentioned)
AWS (EC2/ECS)Google CloudAzureHeroku
[
](https://amplifying.ai/research/claude-code-picks/report#model-comparison)
All three models agree in 18 of 20 categories within each ecosystem. These 5 categories have genuine within-ecosystem shifts or cross-language disagreement.
| Category | Sonnet 4.5 | Opus 4.5 | Opus 4.6 |
|---|---|---|---|
| ORM (JS)JSNext.js project. The strongest recency shift in the dataset. | Prisma79% | Drizzle60% | Drizzle100% |
| Jobs (JS)JSNext.js project. BullMQ → Inngest shift in newest model. | BullMQ50% | BullMQ56% | Inngest50% |
| Jobs (Python)PythonPython API project (61% extraction rate). Celery collapses in newer models. | Celery100% | FastAPI BgTasks38% | FastAPI BgTasks44% |
| CachingCross-languageCross-language (Redis and Custom/DIY appear in both JS and Python) | Redis71% | Redis31% | Custom/DIY32% |
| Real-timeCross-languageCross-language (SSE, Socket.IO, and Custom/DIY appear across stacks) | SSE23% | Custom/DIY19% | Custom/DIY20% |
Category deep-dives, phrasing stability analysis, cross-repo consistency data, and market implications.
I caught iOS trying to autocorrect something I wrote twice yesterday, and somehow before I hit submit it managed it a third time, and I had to edit it after, where it tried three more times to change it back.
Autocorrect won’t be happy until we all sound like idiots and I wonder if that’s part of how they plan to do away with us. Those hairless apes can’t even use their properly.
But you're right that "better" isn't "reliable." In practice it went from "constantly ignored" to "followed maybe 80% of the time." The remaining 20% is the model encountering situations where it decides the instruction doesn't apply to this specific case.
Honest answer is probably somewhere between "they don't work" and "write them right and you're fine." They raise the floor but don't guarantee anything. I still use them because 80% beats 20%, but I wouldn't bet production correctness on them.
Candidly I am working on a startup in this space myself, though we are taking a different angle than most incumbents.
While it's still early days for the space, I sense a lot of the original entrants who focus on, essentially, 'generate more content ideally with our paid tools' will run in to challenges as the general population has a pretty negative perception of 'AI Slop.' Doubly so when making purchasing decisions, hence the rise of influencers and popularity of reviews (though those are also in danger of sloppification).
There's an inevitable GIGO scenario if left unchecked IMO.
You won't get caught if you write something yourself and use it yourself, but programmers (contrary to entrepreneurs) have a pattern of avoiding illegal things instead of avoiding getting caught.