The threat is comfortable drift toward not understanding what you're doing

> Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. [...] Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. [...] If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.

And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.

The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!

---

† "Useful" in this context means "helps you produce good science that benefits humanity".

The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.

To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.

So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.

I don’t like it, but I’m trying to face up to it.

I've just started a new role as a senior SWE after 5 months off. I've been using Claude a bit in my time off; it works really well. But now that I've started using it professionally, I keep running into a specific problem: I have nothing to hold onto in my own mind.

How this plays out:

I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.

Except I can't.

For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.

But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.

And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.

But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.

I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.

If this article was written a year ago, I would have agreed. But knowing what I know today, I highly doubt that the outcomes of LLM/non-LLM users will be anywhere close to similar.

LLMs are exceptionally good at building prototypes. If the professor needs a month, Bob will be done with the basic prototype of that paper by lunch on the same day, and try out dozens of hypotheses by the end of the day. He will not be chasing some error for two weeks, the LLM will very likely figure it out in matter of minutes, or not make it in the first place. Instructing it to validate intermediate results and to profile along the way can do magic.

The article is correct that Bob will not have understood anything, but if he wants to, he can spend the rest of the year trying to understand what the LLM has built for him, after verifying that the approach actually works in the first couple of weeks already. Even better, he can ask the LLM to train him to do the same if he wishes. Learn why things work the way they do, why something doesn't converge, etc.

Assuming that Bob is willing to do all that, he will progress way faster than Alice. LLMs won't take anything away if you are still willing to take the time to understand what it's actually building and why things are done that way.

5 years from now, Alice will be using LLMs just like Bob, or without a job if she refuses to, because the place will be full of Bobs, with or without understanding.

I see this fallacy being committed a lot these days. "Because LLMs, you will no longer need a skill you don't need any more, but which you used to need, and handwaves that's bad".

Academia doesn't want to produce astrophysics (or any field) scientists just so the people who became scientists can feel warm and fuzzy inside when looking at the stars, it wants to produce scientists who can produce useful results. Bob produced a useful result with the help of an agent, and learned how to do that, so Bob had, for all intents and purposes, the exact same output as Alice.

Well, unless you're saying that astrophysics as a field literally does not matter at all, no matter what results it produces, in which case, why are we bothering with it at all?

> Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

I don’t believe this. Totally plausible that someone would be able to produce passable work with LLMs at a similar pace to a curious and talented scientist. But if you, their advisor, are sitting down and talking with them every week? It’s obvious how much they care or understand, I can’t believe you wouldn’t be able to tell the difference between these students.

I think this article is largely, or at least directionally, correct.

I'd draw a comparison to high-level languages and language frameworks. Yes, 99% of the time, if I'm building a web frontend, I can live in React world and not think about anything that is going on under the hood. But, there is 1% of the time where something goes wrong, and I need to understand what is happening underneath the abstraction.

Similarly, I now produce 99% of my code using an agent. However, I still feel the need to thoroughly understand the code, in order to be able to catch the 1% of cases where it introduces a bug or does something suboptimally.

It's possible that in future, LLMs will get _so_ good that I don't feel the need to do this, in the same way that I don't think about the transistors my code is ultimately running on. When doing straightforward coding tasks, I think they're already there, but I think they aren't quite at that point when it comes to large distributed systems.

Ironically, this article reeks of AI-generated phrases. Lot's of "It's not X, it's Y". eg: - "The failure mode isn't malice. It's convenience", - "You haven't saved time. You've forfeited the experience that the time was supposed to give you." - "But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding."

And indeed running it through a few AI text detectors, like Pangram (not perfect, by any means, but a useful approximation), returns high probabilities.

It would have felt more honest if the author had included a disclaimer that it was at least part written with AI, especially given its length and subject matter.

I don’t have kids, but suggested something years ago to my siblings when they started confronting similar issues: we should do a version of “ontogeny recapitulates phylogeny” for personal computers.

Kids should start off with Commodore 64s, then get late 80’s or early 90’s Mac’s, then Windows 95, Debian and internet access (but only html). Finally, when they’re 18, be allowed an iPhone, Android and modern computing.

Parenting can’t prevent the use of LLMs in grad school, but a similar approach could be taken by grad departments: don’t allow LLMs for the first few years, and require pen and paper exams, as well as oral examinations for all research papers.

For the people arguing that the output is the code and the faster we generate it the better..

I do wonder where all the novel products produced by 10x devs who are now 100x with LLMs, the “idea guys” who can now produce products from whole clothe without having to hire pesky engineers.. where is the one-man 10 billion dollar startups, etc? We are 3-4 years into this mania and all I see on the other end of it is the LLMs themselves.

Why hasn’t anything gotten better?

I recently saw a preserved letterpress printing press in person and couldn't help but think of the parallels to the current shift in software engineering. The letterpress allowed for the mass production of printed copies, exchanging the intensive human labor of manual copying to letter setting on the printing press.

Yet what did not change in this process is that it only made the production of the text more efficient; the act of writing, constructing a compelling narrative plot, and telling a story were not changed by this revolution.

Bad writers are still bad writers, good writers still have a superior understanding of how to construct a plot. The technological ability to produce text faster never really changed what we consider "good" and "bad" in terms of written literature; it just allow more people to produce it.

It is hard to tell if large language models can ever reach a state where it will have "good taste" (I suspect not). It will always reflect the taste and skill of the operator to some extent. Just because it allows you to produce more code faster does not mean it allows you to create a better product or better code. You still need to have good taste to create the structure of the product or codebase; you still have to understand the limitations of one architectural decision over another when the output is operationalized and run in production.

The AI industry is a lot of hype right now because they need you to believe that this is no longer relevant. That Garry Tan producing 37,000 LoC/day somehow equates to producing value. That a swarm of agents can produce a useful browser or kernel compiler.

Yet if you just peek behind the curtains at the Claude Code repo and see the pile of unresolved issues, regressions, missing features, half-baked features, and so on -- it seems plainly obvious that there are limitations because if Anthropic, with functionally unlimited tokens with frontier models, cannot use them to triage and fix their own product.

AI and coding agents are like the printing press in some ways. Yes, it takes some costs out of a labor intensive production process, but that doesn't mean that what is produced is of any value if the creator on the other end doesn't understand the structure of the plot and the underlying mechanics (be it of storytelling or system architecture).

I was in academia in the pre-GPT-3 era and I don't see a difference between the superficial pass-the-criteria understanding of things then and now. People already rely on a ton of sources, putting their faith into it, recent replication crisis in social sciences had nothing to do with any LLMs. The problem of academia lies in the first paragraph of this article - supervisor that has to choose doing incremental, clearly feasible stuff. Currently it's called science, but I like to call it knowledge engineering because you're pretty much following a recipe and there is a clear bound on returns to such activities.

This isn't new. It's been the same problem for decades, not what gets built, but what gets accepted.

Weak ownership, unclear direction, and "sure, I guess" reviews were survivable when output was slow. When changes came in one at a time, you could get away with not really deciding.

AI doesn't introduce a new failure mode. It puts pressure on the old one. The trickle becomes a firehose, and suddenly every gap is visible. Nobody quite owns the decision. Standards exist somewhere between tribal memory, wishful thinking, and coffee. And the question of whether something actually belongs gets deferred just long enough to merge it, but forces the answer without input.

The teams doing well with agentic workflows aren't typically using magic models. They've just done the uncomfortable work of deciding what they're building, how decisions are made, and who has the authority to say no.

AI is fine, it just removed another excuse for not having our act together. While we certainly can side-eye AI because of it, we own the problems. Well, not me. The other guy who quit before I started.

I have a vaguely unrelated question re:

> You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year ...

Is that how PhD projects are supposed to work? The supervisor is a subject matter expert and comes up with a well-defined achievable project for the student?

I dunno. Claude helped me implement a new memory allocator, compacting garbage collector and object heap for my programming language. I certainly understood what I was doing when I did this. The experience was extremely engaging for me. Claude taught me a lot.

I think the real danger is no longer caring about what you're doing. Yesterday I just pointed Claude at my static site generator and told it to clean it up. I wanted to care but... I didn't.

This article makes the assumption that Bob was doing absolutely nothing, maybe at the Pub with this friends, while the AI did all his work.

How do we know that while the AI was writing python scripts that Bob wasn't reading more papers, getting more data and just overall doing more than Alice.

Maybe Bob is terrible at debugging python scripts while Alice is a pro at it?

Maybe Bob used his time to develop different skills that Alice couldn't dream of?

Maybe Bob will discover new techniques or ideas because he didn't follow the traditional research path that the established Researchers insist you follow?

Maybe Bob used the AI to learn even more because he had a customized tutor at his disposal?

Or maybe Bob just spent more time at the Pub with his friends.

When you’re deep in a thoughtful read and suddenly get the eerie feeling that you’re being catfished

> But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.

> for someone who doesn't yet have that intuition, the grunt work is the work

Very well said. I think people are about to realize how incredibly fortunate and exceptional it is to actually get paid, and in our industry very well, through a significant fraction of one's career while still "just" doing the grunt work, that arguably benefits the person doing it at least as much as the employer.

A stable paid demand for "first-year grad student level work" or the equivalent for a given industry is probably not the only possible way to maintain a steady supply of experts (there's always the option of immense amounts of student debt or public funding, after all), but it sure seems like a load-bearing one in so many industries and professions.

At the very least, such work being directly paid has the immense advantage of making artificially (often without any bad intentions!) created bullshit tasks that don't exercise actually relevant skillsets, or exercise the wrong ones, much easier to spot.

The flip side I don’t see mentioned very often is that having a product where you know how the code works becomes its own competitive advantage. Better reliability, faster fixes and iteration, deeper and broader capabilities that allow you to be disruptive while everything else is being built towards the mean, etc etc. Maybe we’ve not been in this new age for long enough for that to be reflected in people’s purchasing criteria, but I’m quite looking forward to fending off AI-built competitors with this edge.

The exciting and interesting to me is that we'll probably need to engage "chaos engineering" principles, and encode intentional fallibility into these agents to keep us (and them) as good collaborators, and specifically on our toes, to help all minds stay alert and plastic

If that comes to pass, we'll be rediscovering the same principles that biological evolution stumbled upon: the benefits of the imperfect "branch" or "successive limited comparison" approach of agentic behaviour, which perhaps favours heuristics (that clearly sometimes fail), interaction between imperfect collaborators with non-overlapping biases, etc etc

https://contraptions.venkateshrao.com/p/massed-muddler-intel...

> Lindblom’s paper identifies two patterns of agentic behavior, “root” (or rational-comprehensive) and “branch” (or successive limited comparisons), and argues that in complicated messy circumstances requiring coordinated action at scale, the way actually effective humans operate is the branch method, which looks like “muddling through” but gradually gets there, where the root method fails entirely.

Every PhD program I'm aware of has a final hurdle known as the defence. You have to present your thesis while standing in front of a committee, and often the local community and public. They will asks questions and too many "I don't know" or false answers would make you fail. So, there is already a system in place that should stop Bob from graduating if he indeed learned much less than Alice. A similar argument can be made for conference publications. If Bob publishes his first year project at a conference but doesn't actually understand "his own work" it will show.

The difficulty of passing the defence vary's wildly between Universities, departments and committees. Some are very serious affairs with a decent chance of failure while others are more of a show event for friends and family. Mine was more of the latter, but I doubt I would have passed that day if I had spend the previous years prompting instead of doing the grunt work.

The coding-specific version of this is worth naming precisely. The drift does not happen because you stop writing code. It happens because you stop reading the output carefully. With AI-generated code, there is a particular failure mode: the code is plausible enough to pass a quick review and tests pass, so you ship it. The understanding degradation is cumulative and invisible until it is not. The partial fix is making automated checks independent of the developer's attention level: type checking, SAST, dependency analysis, and coverage gates that run regardless of how carefully you reviewed the diff. These are not a substitute for understanding, but they create a floor below which "comfortable drift" cannot silently carry you. The question worth asking of any AI coding workflow is whether that floor exists and where it is.

I agree with the general premise - the risk is we don’t develop juniors (new Alices) anymore, and at some point people are just sloperators gluing together bits of LLM output they do not understand.

I have seen versions of this in the wild where a firm has gone through hard times and internally systems have lost all their original authors, and every subsequent generation of maintainers… being left with people in awe of the machine that hasn’t been maintained in a decade.

I interviewed a guy once that genuinely was proud of himself, volunteering the information to me as he described resolving a segfault in a live trading system by putting kill -9 in a cronjob. Ghastly.

> Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant.

I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.

In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.

The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.

My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.

Anyone remember the verification crisis in science?

I wonder what effect AI had on online education - course signups, new resources being added etc.

I’ve recently started csprimer and whilst mentally stimulating I wonder if I’m not completely wasting my time.

> When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's.

In my experience, doing these things with the right intentions can actually improve understanding faster than not using them. When studying physics I would sometimes get stuck on small details - e.g. what algebraic rule was used to get from Eq 2.1 to 2.2? what happens if this was d^2 instead of d^3 etc. Textbooks don't have space to answer all these small questions, but LLMs can, and help the student continue making progress.

Also, it seems hard to imagine that Alice and Bob's weekly updates would be indistinguishable if Bob didn't actually understand what he was working on.

> The problem isn't that we'll decide to stop thinking. The problem is that we'll barely notice when we do

Most of what we call thinking is merely to justify beliefs that emotionally make us happy and is not creative per-se. I am making a distinction between "thinking" as we know it and "creative thinking" which is rare, and can see things in an unbiased manner breaking out of known categories. Arguably, at the PhD level, there needs to be a new ideas instead of remixing the existing ones.

There's certainly a risk that an individual will rely too much on AI, to the detriment of their ability to understand things. However, I think there are obvious counter-measures. For example, requiring that the student can explain every single intermediate step and every single figure in detail.

A two-hour thesis defense isn't enough to uncover this, but a 40-hour deep probing examination by an AI might be. And the thesis committee gets a "highlight reel" of all the places the student fell short.

The general pattern is: "Suppose we change nothing but add extensive use of AI, look how everything falls apart." When in reality, science and education are complex adaptive systems that will change as much as needed to absorb the impact of AI.

But aren't you still going to have to convince other people to let you do it with their money/data/hardware/etc? The understanding necessary to make that argument well is pretty deep and is unaffected by AI.

I've been having a lot of fun vibe coding little interactive data visualizations so when I present the feature to stakeholders they can fiddle with it and really understand how it relates to existing data. I saw the agent leave a comment regarding Cramer's rule and yeah its a bit unsettling that I forgot what that is and haven't bothered to look it up, but I can tell from the graphs that its doing the correct thing.

There's now a larger gap between me and the code, but the chasm between me and the stakeholders is getting smaller and so far that feels like an improvement.

See also The Profession by Isaac Asimov [0] and his small story The Feeling of Power [1]. Both are social dramas about societies that went far down the path of ignorance.

[0] http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...

[1] https://s3.us-west-1.wasabisys.com/luminist/EB/A/Asimov%20-%...

Using AI to solve a task does not give you experience in solving the task, it gives you experience in using AI.

I catch myself doing this more than I'd like to admit. Copy something from an LLM, it works, ship it, move on. Then a week later something breaks and I realize I have no idea what that code actually does! The speed is addicting but your slowly trading depth for velocity and at some point that bill comes due.

I think the mountain of things I don’t understand was already huge. It doesn’t stop me from getting a grip over the things I need to be responsible for and using tools to contain complexity irrelevant to me. Like many scientists have a stats person.

The risk is that civilization is over its skis because humans are lazy. Humans are always lazy. In science there’s a limit to bs because dependent works fail. In economics there’s a crash. In physics stuff breaks. Then there is a correction.

This is not wrong, but the "Bob and Alice" conundrum is not simple, either.

In academia, understanding is vital. The same for research.

But in production, results are what matters.

Alice would be a better researcher, but Bob would be a better producer. He knows how to wrangle the tools.

Each has its value. Many researchers develop marvelous ideas, but struggle to commercialize them, while production-oriented engineers, struggle to come up with the ideas.

You need both.

This is almost the same as going from making fire with a stick to using a lighter.. sure it is simplified but still not wrong. Humans while still doing grunt work can still make mistakes as does the machine.. the machine will eventually discover it. The same can not be said of the human because of the work needed to do so might be too much. In the end we might not learn as much.. but it will not matter and thus is really not an issue.

"He shipped a product, but he didn't learn a trade." I think is the key quote from this article, and encapsulates the core problem with AI agents in any skill-based field.

I recommend the manga BLAME! that explore what happens to humanity if you push this to 11 https://fr.wikipedia.org/wiki/BLAME!

The threat is if you replace your cognitive capabilities with AI, but you don't control entire the system your AI runs on (hardware, firmware, drivers, OS, weights, frontend), then that's equivalent to someone else owning a part of your brain.

Thankfully, I am nearing the end of my career with software after 25 years well spent. If I had been born in a different decade, I would be facing the brunt of the AI shift, and I don’t think I would want to continue in the industry. Obviously, this is a personal decision, but we are in a totally different domain now, where, at best, you’re managing an LLM to deliver your product.

Very insightful. One key sentence sums it up: "He shipped a product, but he didn't learn a trade."

This is going to get worse, and eventually cause disastrous damage unless we do something about it, as we risk losing human institutional memory across just about every domain, and end up as child-like supplicants to the machines.

But as the article says, this is a people problem, not a machine problem.

The problem I see with this argument is that the ship sailed on understanding what you are doing years ago. It seems like it is abstraction layers all the way down.

If an AI is capable of producing an elegant solution with fewer levels of abstraction it could be possible that we end up drifting towards having a better understanding of what's going on.

Personally, I wrote an essay to my students explaining exactly that the purpose is for them to think better and improve over time, they can use LLMs but, if they stop thinking, they are just failing themselves, not me.

It had great success, now when I propose to them to use some model to do something, they tends to avoid.

I like this article and it reads well, but I have to say, that to me it really reads as something written by an LLM. Probably under supervision by a human that knew what it should say.

I don't know if I mind.

Example. This paragraph, to me, has a eerily perfect rhythm. The ending sentence perfectly delivers the twist. Like, why would you write in perfect prose an argument piece in the science realm?

> Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

The letterpress analogy is good but misses something. With letterpress you lost a craft skill. With AI coding you risk losing the ability to evaluate the output. Those are different problems.

I use AI agents for coding every day. The agent handles boilerplate and scaffolding faster than I ever could. But when it produces a subtle architectural mistake, you need enough understanding to catch it. The agent won't tell you it made a bad choice.

What actually helps is building review into the workflow. I run automated code reviews on everything the agent produces before it ships. Not because the code is bad, usually it's fine. But the one time in ten that it isn't, you need someone who understands what the code should be doing.

Look at how bad the auto industry has gotten when it comes to quality and recalls.

A combination of beancounters running the show and the old, experienced engineers dying, retiring, and going through buyouts has pretty much left things in a pretty sad state.

Frankly, the "AI as accelerant" argument, as fomoz puts it, holds true only when you have a solid understanding of the domain. In enterprise system builds, we don't often encounter theoretical physics where errors might lead to a broken model rather than a broken system. Instead, a faked coefficient from an LLM could mean a production outage.

It's why I push for a hybrid mentor-apprentice model. We need to actively cultivate the next generation of "Schwartzes" with hands-on, critical thinking before throwing them into LLM-driven environments. The current incentive structure, as conception points out, isn't set up for this, but it's crucial if we want to avoid building on sand.

What a wonderful read. Thank you!

The way I think about this is : We can't catch the hallucinations that we don't know are hallucinations.

For the people arguing that the output is the code and the faster we generate it the better..

Why hasn’t anything gotten better?

Marketing is the moat llms haven't been able to overcome. Being able to create a Word clone is easier but the difficulty of selling it is as hard or harder than ever.

Show me an llm that can sell my product and find market fit.

In reality llms are taking away profitable tools and keeping the revenue themselves.

I'm waiting for Anthropic to realise they can just set a few thousand agents loose to do just that, and monopolize the entire software market overnight. I'm not sure why they haven't done this yet.

Going 100x faster is the problem. As they say, “slow is smooth, and smooth is fast,” when going for product market fit this is very important (and understated, in my opinion). It doesn’t help that your thread is spinning out at a hundred yards a second when what you’re doing is trying to thread the needle.

Could be possible that the 10x devs working at 100x are just starting down the homestretch…

The 10x dev doesn’t just set out to build a hello world app, ya know.