Sonnet 4 has definitely been the best model for our product's use case, but I'd be interested in trying Haiku 4 (or 4.1?) just due to the cost savings.
I'm surprised Anthropic hasn't mentioned anything about Haiku 4 yet since they released the other models.
I uploaded a web design of mine (jpeg) and asked Claude to create the html/css. Asked GPT to do the same. GPT's code looked the closet to the design I created and uploaded. Just five to ten small tweaks and I was done vs. Claude it would have taken me almost triple the steps.
I actually subscribed to both today (resubscribed to GPT) and going to keep testing which one is the better front-end developer (i am, but got to embrace AI ).
What the point of these?
Kind of interesting that we live in an area of AI super advanced, but still make basic UI/UX mistake. The tagline of this blog post shouldn't be "1 min read".
It's not even accurate. I timed myself not reading fast but not slow, took me 3 min 30s. Maybe the images need be OCRed to make the estimation more accurate.
It's making really stupid errors and I have to work three times as much to get the same results as last week.
Economics is important. Best bang for the buck seems to be OpenAI ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context window with useless tokens like Claude does, API works every time. Gets me out of bad spots. Can get confused, but I've been able to muddle through with it.
1: https://openrouter.ai/anthropic/claude-opus-4.1
2: https://openrouter.ai/anthropic/claude-sonnet-4
3: https://block.github.io/goose/
4: https://openrouter.ai/anthropic/claude-3.5-sonnet
(He had been stuck in the Team Rocket hideout (I believe) for weeks)
At least Sonnet 4 is still usable, but I'll be honest, it's been producing worse and worse slob all day.
I've basically wasted the morning on Claude Code when I should've just been doing it all myself.
> We plan to release substantially larger improvements to our models in the coming weeks.
Let's see: we have Claude Code vs. Claude the API vs. Claude the website, and they're totally different from each other? One is command line, one integrates into your IDE (which IDE?) and one is just browser based, I guess. Then you have the different pricing plans, Free, Pro, and Max? But then there's also Claude Team and Claude Enterprise? These are monthly plans that only work with Claude the Website, but Claude Code is per-request? Or is it Claude API that's per-request? I have no idea. Then you have the models: Claude Opus and Claude Sonnet, with various version numbers for each?? Then there's Cline and Cursor and GOOD GRIEF! I just want to putz around with something in VSCode for a few hours!
I've used Aider for a while, and I kind of liked if, but it felt like it needed way more manual work, and I also want to use different models, probably locally hosted. Haven't used Aider in 2 or 3 months, so I don't know if it already has evolved in that way...
edit: in the other hand, the automatic feedback loop means it sometimes go very crazy and the API costs skyrocket easily. But maybe that's another reason to run it locally.
This makes them (Anthropic) worse than OpenAI in terms of openness.
Since in this case as we all know. [0]
"What will permanently change everything is open source and transparent AI models that are smaller and more powerful than GPT-3 or even GPT-4."
I don't believe anyone saying Sonnet yields better results than Opus though, as my experience has been exactly the opposite. But trade-off wise, I can definitely see it being a better experience when used interactively because of its speed and lower cost.
Fwiw I have a Claude pro plan and have no interest in using other offerings so I'm not sure if they're super simple (one model, one interface, one pricing plan)?
Claude Code is currently best-in-class, so no point in starting elsewhere, but you do need to read the documentation.
Anthropic has this useful quick start guide: https://docs.anthropic.com/en/docs/claude-code/quickstart
But I would recommend just starting using Claude in the browser, talk through an idea for a project you have and ask it to build it for you. Go ahead and have a brain storming session before you actually ask it to code - it'll help make sure the model has all of the context. Don't be afraid to overload it with requirements - it's generally pretty good at putting together a coherent plan. If the project is small/fits in a single file - say a one page web app or a complicated data schema + sql queries - then it can usually do a pretty good job in one place. Then just copy+paste the code and run it out of the browser.
This workflow works well for exploring and understanding new topics and technologies.
Cursor is nice because it's an AI integrated IDE (smoother than the VSCode experience above) where you can select which models to use. IMO it seems better at tracking project context than Gemini+VSCode.
Hope this helps!
[1] https://platform.openai.com/docs/guides/flex-processing?api-...
I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3 etc resulting in sometimes hundreds of lines of code.
Versus o3-pro (very slowly) analyzing and then fixing something that seemed completely unrelated in a one or two line change and truly fixing the root cause.
o3-pro level LLMs at reduced cost and increased speed will already be amazing..
there's also claude-code-proxy to make claude code use other models.
These benchmark gains aren't that high, so I doubt it is that obvious.
One obvious explanation is that pricing is strongly related to the price to them, and that their only incentive is for people to use an expensive model of they really need it.
I forget which one of the GPT models was better, faster, and cheaper than the previous model. The incentive there is obviously, "If you want to use the old model for whatever reason, fine, but we really want you to use the new one because costs us less to run."
Claude code is actually one of the most straightforward products I've used as far as onboarding goes. You download the tool, and follow the instructions. You can use one of the 3 plans, and everything else is automatic. You can figure out token usage and what models and versions to use and how to use MCP servers and all of that -- there's a lot of power -- but you don't need to do ANY of that to get started trying it out.
You're not being:
> That critic who doesn't try the stuff he criticizes
You're being:
> That critic who is trying to confirm their biases
It is actually one of my most useful use cases of this tech. Nice to have a way to ask in private so you don’t get snarky answers like: it’s just like buying shoes!
E.g. if need a self-contained script to do some data processing, for example, Opus can often do that in one shot. 500 line Python script would cost around $1, and as long as it's not tricky it just works - you don't need back-and-forth.
I don't think it's possible to employ any human to make 500 line Python script for $1 (unless it's a free intern or a student), let alone do it in one minute.
Of course, if you use LLM interactively, for many small tasks, Opus might be too expensive, and you probably want a faster model anyway. Really depends on how you use it.
(You can do quite a lot in file-at-once mode. E.g. Gemini 2.5 Flash could write 35 KB of code of a full ML experiment in Python - self-contained with data loading, model setup training, evaluation, all in one file, pretty much on the first try.)
Given that there’s nothing close to scientific analysis going on, I find it hard to tell how big the “Sonnet is overall better, not just sometimes” crowd is. I think part of the problem is that “The bigger model is better” feels obvious to say, so why say it? Whereas “the smaller model is better actually” feels both like unobvious advice and also the kind of thing that feels smart to say, both of which would lead to more people who believe it saying it, possibly creating the illusion of consensus.
I was trying to dig into this yesterday, but every time I come across a new thread the things people are saying and the proportions saying what are different.
I suppose one useful takeaway is this: If you’re using Claude Max and get downgraded from Opus to Sonnet for a few hours, you don’t have to worry too much about it being a harsh downgrade in quality.
When can we replace doctors with it?
I do agree it did hit the token limit a lot quicker than before where I could chat for hours without worrying about it.
Either way, still have one last yak to shave for this project so we'll see how efficient it is with that. If it accomplishes the task before burning through all the tokens then win, win, I suppose.
Welcome to the machine
It's almost as if companies sell more than one product.
Why is this the top comment on so many threads about tech products?
This is a well-known and documented phenomenon - the paradox of choice.
I've been working in machine learning and AI for nearly 20 years and the number of options out there is overwhelming.
I've found many of the tools out there do some things I want, but not others, so even finding the model or platform that does exactly what I want or does it the best is a time-consuming process.
Actually, to try it out, prepaid token billing is fine. You are not required to have a subscription for claude code cli. Even just $5 gave me enough breathing room to get a feeling for its potential, personally. I do not touch code often these days so I was relieved not to have to subscribe and cancel again just to play around a little and have it write some basic scripts for me.
My use case so far is usually requesting mechanic work I would rather describe than write myself like certain test suites, and sometimes discovery on messy code bases.
Which it does a lot...
Small models are for querying the context
Opus is cheap if you use it for its niche
It's still pretty much impossible to have any LLM one-shot a complex implementation. There's just too many details to figure out and too much to explain for it to get correct. Often, there's uncertainty and ambiguity that I only understand the correct answer (or rather less bad answer) after I've spent time deep in the code. Having Opus spit out a possibly correct solution just isn't useful to me. I need to understand _why_ we got to that solution and _why_ it's a correct solution for the context I'm working in.
For me, this means that I largely have an iteratively driven implementation approach where any particular task just isn't that complex. Therefore, Sonnet is completely sufficient for my day-to-day needs.
I stick with Sonnet for most things because it's generally good enough and I hit my token limits with it far less often.
> Windsurf reports Opus 4.1 delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet 4.
LLMs are non-deterministic, I think benchmarks should be more about averages of N runs, rather than single shot experiments.
Because I've found it to work pretty amazingly for things that don't need to be exact (like data modeling) or don't have any security implications (public apps). But for everything else I end up having to find all the little bugs by reading the code line by line, which is much slower than just writing the code in the first place.
They might not fit your personal definition of "openness", but they do fit many other equally valid interpretations of that contept.
If you look at the past, whenever Google announces something major, OpenAI almost always releases something as well.
People forget realize that OpenAI was started to compete with Google on AI.
In my experience it take weeks if not months to coordinate a release, from testing to documentation to drafting press releases in multiple languages to benchmarks and website updates.
I’m old and I’ve been in this industry most of my life. I have never once seen or heard of all of that work being done and the company just waiting on competitors before pulling the trigger.
Create a new directory in your terminal
Open that directory, type in "Claude" to run Claude
Press Shit + Tab to go into planning mode
Tell Claude what you want to build - recommend something simple to start with. Specify the languages, environment, frameworks you want, etc.
Claude will come up with a plan. Modify the plan or break it into smaller chunks if necessary
Once plan is approved, ask it to start coding. It will ask you for permissions and give you the finished code
It really is something when you actually watch it go.
There's additional storage costs with google caching, around $3.75 for 5 minutes/Mtok, and Claude Opus is $3.75 for 5minute Cache Writes / Mtok.
For cached reads Gemini Pro is 5X cheaper than Opus and like $0.01 more than Sonnet.
> Small models are for querying the context
I respectfully disagree.
My experience is that large models are capable of understanding large contexts much better. Of course they are more expensive and slower, too. But in terms of accuracy, large models are always better at querying the context.
can look at primal check the mean or dual get out of local minima
in all cases, model, tokenizer, etc is just enough different that will generally pay off in spaces quickly
Opus gives you a bit more rope to hang yourself with imo. Yes, it "thinks" slightly better, but still not good enough to me. But it can be good enough to convince you that it can do the job.. so i dunno, i almost dislike it in this regard. I find Sonnet just easier to predict in this regard.
Could i use Opus like i do Sonnet? Yes definitely, and generally i do. But then i don't really see much difference since i'm hand-holding so much.
I use Opus exclusively and don't hit limits. ccusage reports I'm using the API-equivalent of $2000/mo
I get that it's not an easy problem to solve, but how is Anthropic supposed to solve the actual alignment problem if they can't even stop their production LLMs from glazing the user all the time? And OpenAI is somehow even worse.
I expect to be completely blown away by GPT-5 in the first few days and then over time I will figure out the limitations of the model. Then I will be less impressed because you don't know what it can't do at first.
Claude Mad is tens of hours of opus a month, or you can pay per token and have unlimited.
Or did you mean “I wish it was cheaper”?
I'm outputting a PR every 6 minutes. The reviewers are using Claude to review everything. It used to take a day to add 100 lines to the codebase.. now I can add 100 lines in one prompt
If I want even more productivity (at risk of making the rest of my team look slow) I can tell Claude to output double the lines and ship it off for review. My performance metrics are incredible
My current bottleneck is having to review the huge amounts of code that these models spit out. I do TDD, use auto-linting and type-checking.... but the model makes insidious changes that are only visible on deep inspection.
I find the token/credit restrictions on Opus to be near useless even when using Claude Code. I only ever switch to it so get another model's take on the issue. Five minutes of use and I have hit the limit.
Same context length and throughput limits?
Anecdotally I find gpt4.1 (and mini) were pretty good at those agentic programming tasks but the lack of token caching made the costs blow up with long context.
Maybe I'm out of touch, but I'm not handing out my phone number to sign up for random SaaS tools.
We have the $200 plans for work and despite only using Opus, we rarely hit the limits. CCUsage suggests the same via API would have been ~$2000 over the last month (we work 5 hours a day, 4 days a week, almost always with Claude).
It uses way less tokens or much more effectively when running locally.
Also there's a cli argument that lets you specify the model. try `claude --help`.
It's very, very helpful. However, there are still a lot of problems I only discover/figure out after I've been working in the code.
Maybe Opus just is better
A major part of software engineering is identifying and resolving issues during implementation. Plans are a good outline of what needs to be done, but they're always incomplete and inaccurate.
Subagents seem pretty similar to using zen mcp w/ OpenRouter but maybe better or at least more turnkey? I'll be checking them out.
Instead, ideally they’d run the benchmark tests many times, and share all of the results so we could make statistical determinations.
But for someone who hasn't been immersed in the "LLM scene", it's hard to understand why you might want to use one particular model of another. It's hard to understand why you might want to do per-request API pricing vs. a bucketed usage plan. This is a new technology, and the landscape is changing weekly.
I think maybe it might be nice if folks around here were a bit more charitable and empathetic about this stuff. There's no reason to get all gatekeep-y about this kind of knowledge, and complaining about these questions just sounds condescending and doesn't do anyone any good.
Contrast to something like OpenAI. They've got gpt4.1, 4o, and o4. Which of these are newer than one another? How do people remember which of o4 and 4o are which?
I absolutely loathe this timeline we're stuck in.
If you like an IDE, for example VS Code you can have the terminal open at the bottom and run Claude Code in that. You can put your instructions there and any edits it makes are visibile in the IDE immediately.
Personally I just keep a separate terminal open and have the terminal and VSCode open on two monitors - seems to work OK for me.
It's not 10x, but those guys do seem like they've hit somewhere around 2x improvement overall.
Unfortunately there's no easy tool to inspect usage. I started a project to parse the Claude logs using Claude and generate a Chrome trace with it. It's promising but it was taking my tokens away from my core project.
Example: you need to review some code to see if it has proper test coverage.
If you use the "main" context, it'll waste tokens on reading the codebase and running tests to see coverage results.
But if you launch an agent (a subprocess pretty much), it can use a "disposable" context to do that and only return with the relevant data - which bits of the code need more tests.
Now you can either use the main context to implement the tests or if you're feeling really fancy launch another sub-agent to do it.
Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).
But I totally agree there's no way it lasts. I'm mostly only using this for side projects and I'm sitting there interacting with it, not YOLO'ing, I do sometimes have two sessions going at the same time but I'm not firing off swarms or anything crazy. Just have it set to Opus and I chat with it.
And of course you could be doing it right but the people saying it works great could themselves be wrong about how good it is.
On top of that it costs both money and time/effort investment to figure out if you're doing it wrong. It's understandable to want some clarity. I think it's pretty different from buying shoes.
Because you overestimate the difference that the representative person understands.
A more accurate analogy is that Nike sells green-blue shoes and Nike sells blue-green shoes, but the blue-green shoes add 3 feet to your jump and green-blue shoes add 20 mph to your 100 yard dash sprint.
You know you need one of them for tomorrow's hurdles race but have no idea which is meaningful for your need.
With all this LLM cruft all you get is essentially the same old chat interface that's like the year 2000 called and wants its on-line chat websites back. The only thing other than a text box that you usually get is a model selector dropdown squirreled away in a corner somewhere. And that dropdown doesn't really explain the differences between the cryptic sounding options (GPT-something, Claude Whatever...). Of course this confuses people!
I do know the answer to OP's question but that's because I pickle my brain in this stuff. It is legitimately confusing.
The analogy to different SKUs strikes me also inaccurate. This isn't the difference between shoes, shirts, and shorts - it's more as if a company sells three t-shirts but you can't really tell what's different about them.
It's Claude, Claude, and Claude. Which ones code for you? Well, actually, all of them (Code, web/desktop Claude, and the API can all do this)
Which ones do you ask about daily sundry queries? Well, two of them (web/desktop Claude, but also the API, but not Code). Well, except if your sundry query is about a programming topic, in which case Code can also do that!
Ok, if I do want to use this to write code, which one should I use? Honestly, any of them, and the company does a poor job of explaining why you would use each option.
"Which of these very similar-seeming t-shirts should I get?" "You knob. How are posts like this even being posted." is just an extremely poor way to approach other people, IMO.
At least with those you can buy whatever you think is coolest. Which Claude model and interface should the average programmer use?
I haven't tried it myself, but I've heard from people that Opus can be slow when using it for coding tasks. I've only been using Sonnet, and it's performed well enough for my purposes.
We're all bottlenecked on reviewing now. That's a good thing.
What you're looking for, are the landing pages of the B2B API products underlying these B2C experiences. That would be https://www.anthropic.com/claude, https://openai.com/api/, etc. (In general, search "[AI company] API".)
From those B2B landing pages, you can usually click through to pages with details about each of their models.
Here's the model page corresponding to this news announcement, for example: https://www.anthropic.com/claude/opus
(Also, note how these B2B pages are on the AI companies' own corporate domains; whereas their B2C products have their own dedicated domains. From their perspective, their B2C offerings are essentially treated as separate companies that happen to consume their APIs — a "reference use-case" — rather than as a part of what the B2B company sells.)
I prefer configuring it to use Sonnet for things that don't require much reasoning/intelligence, with Opus as the coordinator.
What's 100x productivity multiplied by 100 instances of Claude? 10,000x productivity
Now to be fair and a bit more realistic it's not actually 10000x because it takes longer to push the PR because the file sizes are so big. Let's call it 9800x. That's still a sizable improvement
It's not always a literal 10x time for taskA w/ AI vs taskA w/o AI...
Lapses of judgement and syntax errors happen, but they're easier to spot because you know exactly what you're looking at. When code is written by a model, I have to review it 3 times.
1st to understand the code. 2nd to identify lapses in suspicious areas. 3rd to confirm my suspicions through interactive tests, because the model can use patterns I'm unfamiliar with, and it takes me some googling to confirm if certain patterns used by the model are outright bugs or not. The biggest time sink is fixing an identified bug, because now you're doing it in someone-else's (model's) legacy code rather than a greenfield feature implementation.
It's a big productivity bump. But, if reviewing is the bottleneck, then that upper bounds the productivity gains at ~4x for me. Still incredible technology, but the death of software-engineering that it is claimed to be.
I wouldn't be surprised if asking for a phone number lowers the fraud rate enough to compensate for the added friction.
[0] Incidentally, this is also why many AI API providers ask for your money upfront (buy credits) unless you're big enough and/or have existing relationship with them.
I started adding an instruction file along the lines of "Always tell me your plan to solve the issue first with short example code, never edit files without explicit confirmation of your plan" at the start and it is like a day and night difference in how useful it becomes. It also starts to feel like programming again where you can read through various files and instead of thinking in your head, you write out your thoughts. You end up getting confirmation or push back on errors that you can clean up.
Reading through a sort of wrong sort of right implementation spread across various files after every prompt just really sucked.
I'm not one shotting massive amounts of files, but I am enjoying the lack of grunt work.
Shoe shopping is pretty complex, more so than trialing an AI model in my opinion.
Are you a construction worker, a banker, a cashier or a driver? Are you walking 5 miles everyday or mostly sedentary? Do you require steel toed shoes? How long are you expecting them to last and what are you willing to pay? Are you going to wear them on long runs or take them river kayaking? Do they need to be water resistant, waterproof or highly breathable? Do you want glued, welted, or stitch down construction? What about flat feet or arch support? Does shoe weight matter? What clothing are you going to wear them with? Are you going to be dancing with them? Do the shoes need a break in period or are they ready to wear? Does the available style match your preferences? What about availability, are you ok having them made to order or do you require something in stock now?
By comparison I can try 10 different AI services without even needing to stand up for a break while I can't buy good dress shoes in the same physical store as a pair of football cleats.
Thanks for articulating the confusion better than I could! I feel it's a similar branding problem as other tech companies have: I'm watching Apple TV+ on my Apple TV software running on my Apple TV connected to my Google TV that isn't actually manufactured by Google. But that Google TV also has an Apple TV app that can play Apple TV+.
It's not like running a tool in your IDE or CLI where the only difference is the interface. It would be like if gcc ran from your IDE had faster compile times, but gcc run from the CLI gives better optimizations.
The fact that no one is recommending any baseline to start with proves the point that it's confusing. And we haven't even touched on Sonnet v Opus
Maybe the problem is I don't take shoes seriously enough? Something to work on...
That's a silly claim to me, we're talking about a completely new environment where you prompt an AI to develop code, and therefore an "average programmer" is unlikely to have any meaningful experience or intuition with this flow. That is exactly what GP is talking about - where does he plug in the AI? What tradeoffs are there to different options?
The other day I had someone judge me for asking this question by dismissively saying "dont say youve still been using ChatGPT and copy/paste", which made me laugh - I don't use AI at all, so who was he looking down on?
Oh c'mon, now you're just being disingenuous, trying to make an argument for argument's sake.
No, shoe shopping is not more complicated than trialing a LLM. For all of those questions about shoes you are posing, either a) a purchaser won't care and won't need to ask them, or b) they already know they have specific requirements and will know what to ask.
With an LLM, a newbie doesn't even know what they're getting into, let alone what to ask or where to start.
> By comparison I can try 10 different AI services without even needing to stand up for a break
I can't. I have no idea how to do that. It sounds like you've been following the space for a while, and you're letting your knowledge blind you to the idea that many (most?) people don't have your experience.
I.e. it seems we don't get much more than new training run levels of improvement anymore. Which is better than nothing, but a shame compared to the early scaling.
I'm not sure if you ever got a good rundown, but the tl;dr is that the 3 products ("Desktop", Code, and API) all expose the same underlying models, but are given different prompts, tools, and context management techniques that make them behave fairly differently and affect how you interact with them.
- The API is the bare model itself. It has some coding ability because that's inherent to the model - you can ask it to generate code and copy and paste it for example. You normally wouldn't use this except that if you're using some Copilot-type IDE integration where the IDE is doing the work of talking to the model for you and integrating it into your developer experience. In that case you provide API key and the IDE does the heavy lifting.
- The desktop app is actually a half-decent coder. It's capable of producing specific artifacts, distinguishing between multiple "files" it's writing for you, and revisiting previously-written code. "Oh, actually rewrite this in Go." is for example a thing it can totally do. I find it useful for diagnosing issues interactively.
- "Claude Code" is a CLI-only wrapper around the model. Think of it like Anthropic's first-party IDE integration, except there's not an IDE, just the CLI. In this case the integration gives the tool broad powers to actually navigate your filesystem, read specific files, write to specific files, run shell commands like builds and tests, etc. These are all functions that an IDE integration would also give you, but this is done in a Claude-y way.
My personal take is: try Claude Code, since as long as you're halfway comfortable with a CLI it's pretty usable. If you really want a direct IDE integration you can go with the IDE+API key route, though keep in mind that you might end up paying more (Claude Code is all-you-can-eat-with-rate-limits, where API keys will... just keep going).
If you allow yourself to be a novice and a learner with AI and LLMs and don't expect to start out as a "shoe expert" where you never even think about this in your life and it's not even an annoyance, you'll find that it's the exact same journey.
Here's a quick guide to get you started with AI coding assistants:
## Quick Start Options (Easiest)
*1. Web-based (Nothing to Download)* - *Claude.ai* - You're here! I can help with code, debug, explain concepts - *ChatGPT* - Similar capabilities, different model - *GitHub Copilot Chat* - Web interface if you have GitHub account
*2. IDE Extensions (Most Popular)* - *Cursor* - Full VS Code replacement with AI built-in. Download from cursor.com, works out of the box - *GitHub Copilot* - Install as VS Code/JetBrains extension ($10/month), autocompletes as you type - *Continue* - Free, open-source VS Code extension, lets you use multiple models
*3. Command Line* - *Claude Code* - Anthropic's terminal tool for autonomous coding tasks. Install via `npm install -g @anthropic-ai/claude-code` - *Aider* - Open-source CLI tool that edits files directly
## What They Do
- *Autocomplete tools* (Copilot, Cursor) - Suggest code as you type, finish functions - *Chat tools* (Claude, ChatGPT) - Explain, debug, design systems, write full programs - *Autonomous tools* (Claude Code, Aider) - Actually edit your files, make changes across codebases
## My Recommendation to Start
1. Try *Cursor* first - download it, paste in some code, and ask it questions. It's the most beginner-friendly 2. Or just start here in Claude - paste your code and I can help debug, explain, or write new features 3. Once comfortable, try GitHub Copilot for in-line suggestions while coding
The key is just picking one and trying it - you don't need to understand everything upfront!
Maybe there's a need to try ten different ones but I just stuck with one and can now convince it to do what I want it to do pretty successfully.
Do you mostly use opus?
And it seems the story you shared sort of proves the point: the web interface worked fine for you and you didn't need to question it until someone was needlessly rude about it.
And to some extent it is like the PC race. Imagine going to work and writing software for whatever devices your company writes software for in whatever toolchain your company uses. Then 2-3 years after the PC race began heating up, asking "Hey I only really write code for whatever devices my employer gives me access to. Now I want to buy one of these new PCs but I don't really understand why I'd choose an Intel over a Motorolla chipset or why I'd prioritize more ROM or more RAM, and I keep hearing about this thing called RISC that's way better than CISC and some of these chips claim to have different addressing modes that are better?"
In what way is this analogous? Running scripts is vastly different than AI codemod. I could easily answer how when and why a build system would be plugged in, and linting and formatting are long-established pathways.
On the flipside there are barely even established practices, let alone best ones, for using AI. The point being offered is that AI companies offer shockingly little guidance on how to use their apparently amazing tool.
I personally have never used AI to author code, so I don't really know how the story I provided proves anything to you. I like it to answer questions about why something isn't working to help give me some leads, and it is good at telling you how to use a new framework quickly, but that's a pretty different practice than it authoring code. Seems like you're kinda dodging the question too.