I do wonder where all the novel products produced by 10x devs who are now 100x with LLMs, the “idea guys” who can now produce products from whole clothe without having to hire pesky engineers.. where is the one-man 10 billion dollar startups, etc? We are 3-4 years into this mania and all I see on the other end of it is the LLMs themselves.
Why hasn’t anything gotten better?
I don’t believe this. Totally plausible that someone would be able to produce passable work with LLMs at a similar pace to a curious and talented scientist. But if you, their advisor, are sitting down and talking with them every week? It’s obvious how much they care or understand, I can’t believe you wouldn’t be able to tell the difference between these students.
How do we know that while the AI was writing python scripts that Bob wasn't reading more papers, getting more data and just overall doing more than Alice.
Maybe Bob is terrible at debugging python scripts while Alice is a pro at it?
Maybe Bob used his time to develop different skills that Alice couldn't dream of?
Maybe Bob will discover new techniques or ideas because he didn't follow the traditional research path that the established Researchers insist you follow?
Maybe Bob used the AI to learn even more because he had a customized tutor at his disposal?
Or maybe Bob just spent more time at the Pub with his friends.
Yet what did not change in this process is that it only made the production of the text more efficient; the act of writing, constructing a compelling narrative plot, and telling a story were not changed by this revolution.
Bad writers are still bad writers, good writers still have a superior understanding of how to construct a plot. The technological ability to produce text faster never really changed what we consider "good" and "bad" in terms of written literature; it just allow more people to produce it.
It is hard to tell if large language models can ever reach a state where it will have "good taste" (I suspect not). It will always reflect the taste and skill of the operator to some extent. Just because it allows you to produce more code faster does not mean it allows you to create a better product or better code. You still need to have good taste to create the structure of the product or codebase; you still have to understand the limitations of one architectural decision over another when the output is operationalized and run in production.
The AI industry is a lot of hype right now because they need you to believe that this is no longer relevant. That Garry Tan producing 37,000 LoC/day somehow equates to producing value. That a swarm of agents can produce a useful browser or kernel compiler.
Yet if you just peek behind the curtains at the Claude Code repo and see the pile of unresolved issues, regressions, missing features, half-baked features, and so on -- it seems plainly obvious that there are limitations because if Anthropic, with functionally unlimited tokens with frontier models, cannot use them to triage and fix their own product.
AI and coding agents are like the printing press in some ways. Yes, it takes some costs out of a labor intensive production process, but that doesn't mean that what is produced is of any value if the creator on the other end doesn't understand the structure of the plot and the underlying mechanics (be it of storytelling or system architecture).
[0] http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...
[1] https://s3.us-west-1.wasabisys.com/luminist/EB/A/Asimov%20-%...
I have seen versions of this in the wild where a firm has gone through hard times and internally systems have lost all their original authors, and every subsequent generation of maintainers… being left with people in awe of the machine that hasn’t been maintained in a decade.
I interviewed a guy once that genuinely was proud of himself, volunteering the information to me as he described resolving a segfault in a live trading system by putting kill -9 in a cronjob. Ghastly.
> But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.
I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.
In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.
The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.
My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.
Anyone remember the verification crisis in science?
Very well said. I think people are about to realize how incredibly fortunate and exceptional it is to actually get paid, and in our industry very well, through a significant fraction of one's career while still "just" doing the grunt work, that arguably benefits the person doing it at least as much as the employer.
A stable paid demand for "first-year grad student level work" or the equivalent for a given industry is probably not the only possible way to maintain a steady supply of experts (there's always the option of immense amounts of student debt or public funding, after all), but it sure seems like a load-bearing one in so many industries and professions.
At the very least, such work being directly paid has the immense advantage of making artificially (often without any bad intentions!) created bullshit tasks that don't exercise actually relevant skillsets, or exercise the wrong ones, much easier to spot.
I’ve recently started csprimer and whilst mentally stimulating I wonder if I’m not completely wasting my time.
I think the real danger is no longer caring about what you're doing. Yesterday I just pointed Claude at my static site generator and told it to clean it up. I wanted to care but... I didn't.
Most of what we call thinking is merely to justify beliefs that emotionally make us happy and is not creative per-se. I am making a distinction between "thinking" as we know it and "creative thinking" which is rare, and can see things in an unbiased manner breaking out of known categories. Arguably, at the PhD level, there needs to be a new ideas instead of remixing the existing ones.
The risk is that civilization is over its skis because humans are lazy. Humans are always lazy. In science there’s a limit to bs because dependent works fail. In economics there’s a crash. In physics stuff breaks. Then there is a correction.
I use AI agents for coding every day. The agent handles boilerplate and scaffolding faster than I ever could. But when it produces a subtle architectural mistake, you need enough understanding to catch it. The agent won't tell you it made a bad choice.
What actually helps is building review into the workflow. I run automated code reviews on everything the agent produces before it ships. Not because the code is bad, usually it's fine. But the one time in ten that it isn't, you need someone who understands what the code should be doing.
A two-hour thesis defense isn't enough to uncover this, but a 40-hour deep probing examination by an AI might be. And the thesis committee gets a "highlight reel" of all the places the student fell short.
The general pattern is: "Suppose we change nothing but add extensive use of AI, look how everything falls apart." When in reality, science and education are complex adaptive systems that will change as much as needed to absorb the impact of AI.
If an AI is capable of producing an elegant solution with fewer levels of abstraction it could be possible that we end up drifting towards having a better understanding of what's going on.
It had great success, now when I propose to them to use some model to do something, they tends to avoid.
This is going to get worse, and eventually cause disastrous damage unless we do something about it, as we risk losing human institutional memory across just about every domain, and end up as child-like supplicants to the machines.
But as the article says, this is a people problem, not a machine problem.
The way I think about this is : We can't catch the hallucinations that we don't know are hallucinations.
A combination of beancounters running the show and the old, experienced engineers dying, retiring, and going through buyouts has pretty much left things in a pretty sad state.
Academia doesn't want to produce astrophysics (or any field) scientists just so the people who became scientists can feel warm and fuzzy inside when looking at the stars, it wants to produce scientists who can produce useful results. Bob produced a useful result with the help of an agent, and learned how to do that, so Bob had, for all intents and purposes, the exact same output as Alice.
Well, unless you're saying that astrophysics as a field literally does not matter at all, no matter what results it produces, in which case, why are we bothering with it at all?
I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.
To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.
So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.
I don’t like it, but I’m trying to face up to it.
I'd draw a comparison to high-level languages and language frameworks. Yes, 99% of the time, if I'm building a web frontend, I can live in React world and not think about anything that is going on under the hood. But, there is 1% of the time where something goes wrong, and I need to understand what is happening underneath the abstraction.
Similarly, I now produce 99% of my code using an agent. However, I still feel the need to thoroughly understand the code, in order to be able to catch the 1% of cases where it introduces a bug or does something suboptimally.
It's possible that in future, LLMs will get _so_ good that I don't feel the need to do this, in the same way that I don't think about the transistors my code is ultimately running on. When doing straightforward coding tasks, I think they're already there, but I think they aren't quite at that point when it comes to large distributed systems.
And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.
Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.
The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!
---
† "Useful" in this context means "helps you produce good science that benefits humanity".
How this plays out:
I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.
Except I can't.
For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.
But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.
And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.
But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.
I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.
LLMs are exceptionally good at building prototypes. If the professor needs a month, Bob will be done with the basic prototype of that paper by lunch on the same day, and try out dozens of hypotheses by the end of the day. He will not be chasing some error for two weeks, the LLM will very likely figure it out in matter of minutes, or not make it in the first place. Instructing it to validate intermediate results and to profile along the way can do magic.
The article is correct that Bob will not have understood anything, but if he wants to, he can spend the rest of the year trying to understand what the LLM has built for him, after verifying that the approach actually works in the first couple of weeks already. Even better, he can ask the LLM to train him to do the same if he wishes. Learn why things work the way they do, why something doesn't converge, etc.
Assuming that Bob is willing to do all that, he will progress way faster than Alice. LLMs won't take anything away if you are still willing to take the time to understand what it's actually building and why things are done that way.
5 years from now, Alice will be using LLMs just like Bob, or without a job if she refuses to, because the place will be full of Bobs, with or without understanding.
And indeed running it through a few AI text detectors, like Pangram (not perfect, by any means, but a useful approximation), returns high probabilities.
It would have felt more honest if the author had included a disclaimer that it was at least part written with AI, especially given its length and subject matter.
If that comes to pass, we'll be rediscovering the same principles that biological evolution stumbled upon: the benefits of the imperfect "branch" or "successive limited comparison" approach of agentic behaviour, which perhaps favours heuristics (that clearly sometimes fail), interaction between imperfect collaborators with non-overlapping biases, etc etc
https://contraptions.venkateshrao.com/p/massed-muddler-intel...
> Lindblom’s paper identifies two patterns of agentic behavior, “root” (or rational-comprehensive) and “branch” (or successive limited comparisons), and argues that in complicated messy circumstances requiring coordinated action at scale, the way actually effective humans operate is the branch method, which looks like “muddling through” but gradually gets there, where the root method fails entirely.
Show me an llm that can sell my product and find market fit.
In reality llms are taking away profitable tools and keeping the revenue themselves.
> You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year ...
Is that how PhD projects are supposed to work? The supervisor is a subject matter expert and comes up with a well-defined achievable project for the student?
Weak ownership, unclear direction, and "sure, I guess" reviews were survivable when output was slow. When changes came in one at a time, you could get away with not really deciding.
AI doesn't introduce a new failure mode. It puts pressure on the old one. The trickle becomes a firehose, and suddenly every gap is visible. Nobody quite owns the decision. Standards exist somewhere between tribal memory, wishful thinking, and coffee. And the question of whether something actually belongs gets deferred just long enough to merge it, but forces the answer without input.
The teams doing well with agentic workflows aren't typically using magic models. They've just done the uncomfortable work of deciding what they're building, how decisions are made, and who has the authority to say no.
AI is fine, it just removed another excuse for not having our act together. While we certainly can side-eye AI because of it, we own the problems. Well, not me. The other guy who quit before I started.
The difficulty of passing the defence vary's wildly between Universities, departments and committees. Some are very serious affairs with a decent chance of failure while others are more of a show event for friends and family. Mine was more of the latter, but I doubt I would have passed that day if I had spend the previous years prompting instead of doing the grunt work.
In my experience, doing these things with the right intentions can actually improve understanding faster than not using them. When studying physics I would sometimes get stuck on small details - e.g. what algebraic rule was used to get from Eq 2.1 to 2.2? what happens if this was d^2 instead of d^3 etc. Textbooks don't have space to answer all these small questions, but LLMs can, and help the student continue making progress.
Also, it seems hard to imagine that Alice and Bob's weekly updates would be indistinguishable if Bob didn't actually understand what he was working on.
Kids should start off with Commodore 64s, then get late 80’s or early 90’s Mac’s, then Windows 95, Debian and internet access (but only html). Finally, when they’re 18, be allowed an iPhone, Android and modern computing.
Parenting can’t prevent the use of LLMs in grad school, but a similar approach could be taken by grad departments: don’t allow LLMs for the first few years, and require pen and paper exams, as well as oral examinations for all research papers.
I've been having a lot of fun vibe coding little interactive data visualizations so when I present the feature to stakeholders they can fiddle with it and really understand how it relates to existing data. I saw the agent leave a comment regarding Cramer's rule and yeah its a bit unsettling that I forgot what that is and haven't bothered to look it up, but I can tell from the graphs that its doing the correct thing.
There's now a larger gap between me and the code, but the chasm between me and the stakeholders is getting smaller and so far that feels like an improvement.
I don't know if I mind.
Example. This paragraph, to me, has a eerily perfect rhythm. The ending sentence perfectly delivers the twist. Like, why would you write in perfect prose an argument piece in the science realm?
> Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.
In academia, understanding is vital. The same for research.
But in production, results are what matters.
Alice would be a better researcher, but Bob would be a better producer. He knows how to wrangle the tools.
Each has its value. Many researchers develop marvelous ideas, but struggle to commercialize them, while production-oriented engineers, struggle to come up with the ideas.
You need both.
It's why I push for a hybrid mentor-apprentice model. We need to actively cultivate the next generation of "Schwartzes" with hands-on, critical thinking before throwing them into LLM-driven environments. The current incentive structure, as conception points out, isn't set up for this, but it's crucial if we want to avoid building on sand.
The 10x dev doesn’t just set out to build a hello world app, ya know.
Once they have to solve a novel problem that was not already solved for all intentes and purposes, Alice will be able to apply her skillset to that, whereas Bob will just run into a wall when the LLM starts producing garbage.
It seems to me that "high-skill human" > "LLM" > "low-skill human", the trap is that people with low levels of skills will see a fast improvement of their output, at the hidden cost of that slow build-up of skills that has a way higher ceiling.
And there is everything in between.
This can lead to a lot of problems as I think in some fields, by some academics, the default assumption is the former, when it's really the latter. This leads to a kind of overattribution of contribution by senior faculty, or conversely, an underappreciation of less senior individuals. The tendency for senior faculty be listed last on papers, and therefore, for the first and last authors to accumulate credit, is a good example of how twisted this logic has become.
It's one tiny example of enormous problems with credit in academics (but also maybe far afield from your question).
There’s a reason most people aren’t promoted to manager until they have years of experience under their belt. And now we’re expecting folks to be managers on day 1.
To the reader and the casual passerby, I ask: Do you have to work at this pace, in this manner? I understand completely that mandates and pressure from above may instill a primal fear to comply, but would you be willing to summon enough courage to talk to maybe one other person you think would be sympathetic to these feelings? If you have ever cared about quality outcomes, if for no other reason than the sake of personal fulfillment, would it not be worth it to firmly but politely refuse purely metrics-focused mandates?
Also, the premise that it took each of them a year to do the project means Bob was slacking because he probably could've done it in less than a month.
Yes, but how does he know if it worked? If you have instant feedback, you can use LLMs and correct when things blow up. In fact, you can often try all options and see which works, which makes it ”easy” in terms of knowledge work. If you have delayed feedback, costly iterations, or multiple variables changing underneath you at all times, understanding is the only way.
That’s why building features and fixing bugs is easy, and system level technical decision making is hard. One has instant feedback, the other can take years. You could make the ”soon” argument, but even with better models, they’re still subject to training data, which is minimal for year+ delayed feedback and multivariate problems.
This point is directly addressed in the paper: Bob will ultimately not be able to do the things Alice can, with or without agents, because he didn't build the necessary internal deep structure and understanding of the problem space.
And if Alice later on ends up being a better scientist (using agents!) than Bob will ever be, would you not say there was something lost to the world?
Learning needs a hill to climb, and somebody to actually climb it. Bob only learned how to press an elevator button.
So I settled on very incremental work. It's annoying cutting and pasting code blocks into the web interface while I'm working on my interface to Neovim, spent a whole day realizing I can't trust it to instrument neovim and don't want to learn enough lua to manage it. (I moved onto neovim from Emacs because I don't like elisp and gpt is even worse at working on my emacs setup than neovim, the end goal is my own editor in ruby but gpt damn sure can't understand that atm) But at least I'm pushing a real flywheel and not the brooms from Fantasia.
I built a full stack app in Python+typescript where AI agents process 10k+ near-real-time decisions and executions per day.
I have never done full stack development and I would not have been able to do it without GitHub Copilot, but I have worked in IT (data) for 15 years including 6 in leadership. I have built many systems and teams from scratch, set up processes to ensure accuracy and minimize mistakes, and so on.
I have learned a ton about full stack development by asking the coding agent questions about the app, bouncing ideas off of it, planning together, and so on.
So yes, you need to have an idea of what you're doing if you want to build anything bigger than a cheap one shot throwaway project that sort of works, but brings no value and nobody is actually gonna use.
This is how it is right now, but at the same time AI coding agents have come an incredibly long way since 2022! I do think they will improve but it can't exactly know what you want to build. It's making an educated guess. An approximation of what you're asking it to do. You ask the same thing twice and it will have two slightly different results (assuming it's a big one shot).
This is the fundamental reality of LLMs, sort of like having a human walking (where we were before AI), a human using a car to get to places (where we are now) and FSD (this is future, look how long this took compared to the first cars).
But do you actually understand it? The article argues exactly against this point - that you cannot understand the problems in the same way when letting agents do the initial work as you would when doing it without agents.
from the article: "you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI agents, we've collectively decided that maybe this time it's different. That maybe nodding at Claude's output is a substitute for doing the calculation yourself. It isn't. We knew that before LLMs existed. We seem to have forgotten it the moment they became convenient."
If I told you the drink tastes bad, is an off putting color, comes in a small bottle, and is expensive you wouldn’t believe it would work. But Red Bull made billions.
Some people treat toilet as magic hole where they throw stuff in flush and think it is fine.
If you throw garbage in you will at some point have problems.
We are in stage where people think it is fine to drop everything into LLM but then they will see the bill for usage and might be surprised that they burned money and the result was not exactly what they expected.
And it's not just a volume problem.
Mediocre devs previously couldn't complete a project by themselves and were forced to solicit help and receive feedback along the way.
When all managers care about is "shipping", development becomes a race to the bottom. Devs who used to collaborate are now competing. Whoever gets the slop into the codebase fastest, wins.
The process you describe is a gate keeping exercise which will change to include llm judges at somepoint.
Because we largely want people who have committed to tens of thousands of dollars of debt to feel sufficiently warm and fuzzy enough to promote the experience so that the business model doesn’t collapse.
It’s difficult to think anyone would end up truly regretting doing a course in astrophysics, or any of the liberal arts and sciences if they have a modicum of passion, but it’s very believable that a majority of them won’t go on to have a career in it, whatever it is, directly.
They’re probably more likely to gain employment on their data science skills, or whether core competencies they honed, or just the fact that they’ve proven they can learn highly abstract concepts, or whatever their field generalises to.
Most of the jobs are in not-highly-specific academic-outcome.
The industrialization of academia hasn't even produced more results, it has produced more meaningless papers. Just like LLMs produce the 10.000th note taking app, which for the LLM psychosis afflicted is apparently enough.
We're minting an entire generation of people completely dependent on VC funding. What happens if/when the AI companies fail to find a path to profitability and the VC funding dries up?
Once I realized that this white on black contrast was hurting my eyes, I decided to stop as I didn't want to see stripes for too long when looking away.
Some activity has outcomes that aren't strictly in the results.
The problem arrises when Bob encounters a problem too complex or unique for agents to solve.
To me, it seems a bit like the difference between learning how to cook versus buying microwave dinners. Sure, a good microwave dinner can taste really good, and it will be a lot better than what a beginning cook will make. But imagine aspiring cooks just buying premade meals because "those aren't going anywhere". Over the span of years, eventually a real cook will be able to make way better meals than anything you can buy at a grocery store.
The market will always value the exact things LLMs can not do, because if an LLM can do something, there is no reason to hire a person for that.
"Being able to deliver using AI" wasn't the point of the article. If it was the point, your comment would make sense.
The point of the program referred to in the article is not to deliver results, but to deliver an Alice. Delivering a Bob is a failure of the program.
Whether you think that a Bob+AI delivers the same results is not relevant to the point of the article, because the goal is not to deliver the results, it's to deliver an Alice.
I do think coding with local agents will keep improving to a good level but if deep thinking cloud tokens become too expensive you'll reach the limits of what your local, limited agent can do much more quickly (i.e. be even less able to do more complex work as other replies mention).
I dread the flip side of this which is dealing with obtuse bullshit like trying to understand why Oracle ADF won’t render forms properly, or how to optimize some codebase with a lot of N+1 calls when there’s looming deadlines and the original devs never made it scalable, or needing to dig into undercommented legacy codebases or needing to work on 3-5 projects in parallel.
Agents iterating until those start working (at least cases that are testable) and taking some of the misery and dread away makes it so that I want to theatrically defenestrate myself less.
Not everyone has the circumstance to enjoy pleasant and mentally stimulating work that’s not a frustrating slog all the time - the projects that I actually like working on are the ones I pick for weekends, I can’t guarantee the same for the 9-5.
Aren't they currently propped up by investor money?
What happens when the investors realize the scam that it is and stop investing or start investing less...
The problem is, they're nothing like transistors, and never will be. Those are simple. Work or don't, consistently, in an obvious, or easily testable, way.
LLM are more akin to biological things. Complex. Not well understood. Unpredictable behavior. To be safely useful, they need something like a lion tamer, except every individual LLM is its own unique species.
I like working on computers because it minimizes the amount of biological-like things I have to work with.
The solution is relatively simple though - not sure the article suggests this as I only skimmed through:
Being good in your field doesn't only mean pushing articles but also being able to talk about them. I think academia should drift away from written form toward more spoken form, i.e. conferences.
What if, say, you can only publish something after presenting your work in person, answer questions, etc? The audience can be big or small, doesn't matter.
It would make publishing anything at all more expensive but maybe that's exactly what academia needs even irrespective of this AI craze?
I hope it will encourage people to think more about what they get out of the work, what doing the work does for them; I think that's a good thing.
That you can't "become Schwartz" by using LLMs is an unproven assumption. Actually, it's a contradiction in the logic of the essay: if Bob managed to produce a valid output by using an LLM at all, then it means that he must have acquired precisely that supervision ability that the essay claims to be necessary.
Btw, note that in the thought experiment Bob isn't just delegating all the work to the LLM. He makes it summarise articles, extract important knowledge and clarify concepts. This is part of a process of learning, not being a passive consumer.
I have gained a lot of benefit using LLMs in conjunction with textbooks for studying. So, I think LLMs could help you become Schwartz.
Then I spend time to read each file change and give feedback on things I'd do differently. Vastly saves me time and it's very close or even better than what I would have written.
If the result is something you can't explain than slow down and follow the steps it takes as they are taken.
You can do this during the previous change phase of course. Just ask "How would one plan this change to the codebase? Could you explain in depth why?" If you're expected to be thoroughly familiar with that code, it makes no sense to skip that step.
In a living codebase you spent long stretches to learn how it works. It's like reading a book that doesn't match your taste, but you eventually need to understand and edit it, so you push through. That process is extremely valuable, you will get familiar with the codebase, you map it out in your head, you imagine big red alerts on the problematic stuff. Over time you become more and more efficient editing and refactoring the code.
The short term state of AI is pretty much outlined by you. You get a high level bug or task, you rephrase it into proper technical instructions and let a coding agent fill in the code. Yell a few times. Fix problems by hand.
But you are already "detached" from the codebase, you have to learn it the hard way. Each time your agent is too stupid. You are less efficient, at least in this phase. But your overall understanding of the codebase will degrade over time. Once the serious data corruption hits the company, it will take weeks to figure it out.
I think this psychological detachment can potentially play out really bad for the whole industry. If we get stuck for too long in this weird phase, the whole tech talent pool might implode. (Is anyone working on plumbing LLMs?)
This won’t affect everyone equally. Some Bob’s will nerd out and spend their free time learning, but other Bob’s won’t.
Your perspective is cut off. In the real world Bob is supposed to produce outcomes that work. If he moves on into the industry and keeps producing hallucinated, skewed, manipulated nonsense, then he will fall flat instantly. If he manages to survive unnoticed, he will become CEO. The latter rather unlikely.
LLM speak. But the rest of that quote doesn't look LLM-generated, it's too fiddly and complex of an argument. I think this was edited with AI, but the underlying argument at least is human.
It's actually great since a lot of older technology is cheap and still readily available. My little ones love listening to old records, control the playback speed and hear the music go up in pitch if the RPMs are set too high. We look at the tracks on the vinyl under a microscope at talk about how the music is written on it that way. VHS an audio cassettes offer their own talking points.
For computers, we don't literally use a Commodore 64 but we run simpler, old software on new hardware. Mostly because a lot of newer education software is somehow also funded by injecting ads into the games (awful). But there is also some good "modern" educations software worth checking out. I highly recommend gcompris.net.
yea, there are multiple parts to education. 1) teach skills useful to the economy 2) teach the theories of the subject, and finally 3) tweak existing theories and create new ones. An electrician can fix problems without understanding theory of electromagnetism. These are the trades folks. A EE college graduate has presumably understood some theory, and can apply it in different useful ways. These are the engineers. Finally, there are folks who not only understand the theory of the craft, but can tweak it creatively for the future. These are the researchers.
Bob better fits as a trades-person or engineer whereas Alice fits better as a researcher.
The economics and security model on full agents running in loops all day may come home to roost faster than expertise rot.
To be clear, I am not against alternative forms of education. Degrees are optional. But if you want a degree, there have to be exams and cheating has to be prevented.
If you could, why wouldn't you? LLM witch-hunts over every halfway competent writer are becoming quite tiresome
1) Stuff that was astonishingly not automated yet. I am talking about somebody opening up excel on one screen, and a website/pdf/whatever on the other.. and type stuff in to your excel sheet. So stuff where there wasn't any code involved previously, possibly due to diminishing returns of how adhoc it was to automate, skills mismatch, organizational politics or other reasons.
2) Lot of former BigData / crypto / SaaS guys who were in product/sales roles suddenly starting AI startups to help your company AI better. The product is facilitating the doing of AI.
The problem that the author describes is real. I have run into it hundreds of times now. I will know how to do something, I tell AI to do it, the AI does not actually know how to do it at a fundamental level and will create fake tests to prove that it is done, and you check the work and it is wrong.
You can describe to the AI to do X at a very high-level but if you don't know how to check the outcome then the AI isn't going to be useful.
The story about the cook is 100% right. McDonald's doesn't have "chefs", they have factory workers who assemble food. The argument with AI is that working in McDonald's means you are able to cook food as well as the best chef.
The issue with hiring is that companies won't be able to distinguish between AI-driven humans and people with knowledge until it is too late.
If you have knowledge and are using AI tools correctly (i.e. not trying to zero-shot work) then it is a huge multiplier. That the industry is moving towards agent-driven workflows indicates that the AI business is about selling fake expertise to the incompetent.
It’s actually worse than that: the AI will not stop and say ”too complex, try in a month with the next SOTA model”. Rather, it will give Bob a plausible looking solution that Bob cannot identify as right or wrong. If Bob is working on an instant feedback problem, it’s ok: he can flag it, try again, ask for help. But if the error can’t be detected immediately, it can come back with a vengeance in a year. Perhaps Bob has already gotten promoted by then, and Bobs replacement gets to deal with it. In either case, Bob cannot be trusted any more than the LLM itself.
That's the true AI revolution: not the things it can accelerate, the things it can put in reach that you wouldn't countenance doing before.
It's trivial to say using an inadequate tool will have an inadequate result.
It's only an interesting claim to make if you are saying that there is no obtainable quality of the tool that can produce an adequate result (In this argument, the adequate result in question is a developer with an understanding of what they produce)
When people talk about the 'plateau of ability' agents are widely expected to reach at some point, I suspect a lot of it will boil down to skyrocketing costs and plummeting accuracy past a certain point of number of agents involved. This seems to me like a much harder limit than context windows or model sizes.
Things like Gas Town are exploring this in what you might call a reckless way; I'm sure there are plenty of more careful experiments being conducted.
What I think the ultimate measure of this new tech will be is, how simple of a question can a human put to an LLM group for how complex of a result, and how much will they have to pay for it? It seems obvious to me there is a significant plateau somewhere, it's just a question of exactly where. Things will probably be in flux for a few years before we have anything close to a good answer, and it will probably vary widely between different use cases.
If AI output remains unreliable then eventually enough companies will be burned and management will reinstate proper oversight. All while continuing to pay themselves on the back.
My grad school friend who was a physicist would write his talk just before his conferences, and then submit the paper later. My experience in CS was totally backwards from that.
Perhaps a better analogy would be the Linux kernel. It's built by biological humans, and fallible ones at that. And yet, I don't feel the need to learn the intricacies of kernel internals, because it's reliable enough that it's essentially never the kernel's fault when my code doesn't work.
There are flowers that look & smell like female wasps well enough to fool male wasps into "mating" with them. But they don't fly off and lay wasp eggs afterwards.
Here's a competing thought experiment:
Jorge's Gym has a top notch body building program, which includes an extensive series of exercises that would-be body builders need to do over multiple years to complete the program. You enroll, and cleverly use a block and tackle system to complete all the exercises in weeks instead of years.
Did you get the intended results?
At a certain point, higher level languages stop working. Performance, low level control of clocks and interrupts, etc.
I’m old enough dropping into assembly to be clever with the 8259 interrupt controller really was required. Programmers today? The vast majority don’t really understand how any of that works.
And honestly I still believe that hardware-up understanding is valuable. But is it necessary? Is it the most important thing for most programmers today?
When I step back this just reads like the same old “kids these days have it so easy, I had to walk to school uphill through the snow” thing.
Why is it a problem of the LLM if your test is unrelated to the performance you want?
This argument boils down to "don't use tools because you'll forget how to do things the hard way", which nobody would buy for any other tool, but with LLMs we seem to have forgotten that line of reasoning entirely.
But there is also a more subtle thing, which is we're trending towards superintelligence with these AIs. At the point, Bob may discover that anything agents can't do, Alice can't do because she is limited by trying to think using soggy meat as opposed to a high-performance engineered thinking system. Not going to win that battle in the long term.
> The market will always value the exact things LLMs can not do, because if an LLM can do something, there is no reason to hire a person for that.
The market values bulldozers. Whether a human does actual work or not isn't particularly exciting to a market.
There will still be programming specialists in the future — we still have assembly experts and COBOL experts, after all. We just won’t need very many of them and the vast majority of software engineers will use higher-level tools.
Of course, that assumes a Bob with drive and agency. He could just as easily tell the AI to fix it without trying to stay in the loop.
Even if inference was subsidized (afaik it isn't when paying through API calls, subscription plans indeed might have losses for heavy users, but that's how any subscription model typically work, it can still be profitable overall).
Models are still improving/getting cheaper, so that seems unlikely.
Are Chinese model shops propped up by investor money? Is Google?
Open weights models are only 6 months behind SOTA. If new model development suddenly stopped, and today's SOTA models suddenly disappeared, we would still have access to capable agents.
Last September, Tyler Austin Harper published a piece for The Atlantic on how he thinks colleges should respond to AI. What he proposes is radical—but, if you've concluded that AI really is going to destroy everything these institutions stand for, I think you have to at least consider these sorts of measures. https://www.theatlantic.com/culture/archive/2025/09/ai-colle...
Sure there is. Its the formal education system that produced the college grad.
There's a couple unfortunate truths going on all at the same time:
- People with money are trying to build the "perfect" business: SaaS without software eng headcount. 100% margin. 0 Capex. And finally near-0 opex and R&D cost. Or at least, they're trying to sell the idea of this to anyone who will buy. And unfortunately this is exactly what most investors want to hear, so they believe every word and throw money at it. This of course then extends to many other business and not just SaaS, but those have worse margins to start with so are less prone to the wildfire.
- People who used to code 15 years ago but don't now, see claude generating very plausible looking code. Given their job is now "C suite" or "director", they don't perceive any direct personal risk, so the smell test is passed and they're all on board, happily wreaking destruction along the way.
- People who are nominally software engineers but are bad at it are truly elevated 100x by claude. Unfortunately, if their starting point was close to 0, this isn't saying a lot. And if it was negative, it's now 100x as negative.
- People who are adjacent to software engineering, like PMs, especially if they dabble in coding on the side, suddenly also see they "can code" now.
Now of course, not all capital owners, CTOs, PMs, etc exhibit this. Probably not even most. But I can already name like 4 example per category above from people I know. And they're all impossible to explain any kind of nuance to right now. There's too many people and articles and blog posts telling them they're absolutely right.
We need some bust cycle. Then maybe we can have a productive discussion of how we can leverage LLMs (we'll stop calling it "AI"...) to still do the team sport known as software engineering.
Because there's real productivity gains to be had here. Unfortunately, they don't replace everyone with AGI or allow people who don't know coding or software engineering to build actual working software, and they don't involve just letting claude code stochastically generate a startup for you.
> If the result is something you can't explain than slow down and follow the steps it takes as they are taken.
The problem is I can explain it. But it's rote and not malleable. I didn't do the work to prove it to myself. Its primary form is on the page, not in my head, as it were.
Same with anything. You can read about how to meditate, cook, sew, whatever. But if you only read about something, your mental model is hollow and purely conceptual, having never had to interact with actual reality. Your brain has to work through the problems.
It also doesn't help that Claude ends every recommendation with "Would you like me to go ahead and do that for you?" Eventually people get tired and it's all to easy to just nod and say "yes".
People think they'll just have a personal bot out there buying airline tickets, hotel rooms, jeans, new phone, etc. Meanwhile as soon as you have agents like this out in the wild, the capital will flow to bad actors creating bots to game those bots.
The world is PvP unfortunately. There is more money to be made skimming agents trying to buy stuff than there is in getting people to pay for a personal assistant agent subscription.
It's like why a lot of ad-based stuff doesn't offer a premium option for people to pay to opt out (ex Youtube). The people who can afford to pay and avoid search/social media/etc advertising are exactly the people you can make a lot of money advertising to.
In production, there would be no "paper"; just some software/hardware product.
If there was a problem, that would be fairly obvious, with testing (we are going to be testing our products, right?).
I have been wrestling all morning, with an LLM. It keeps suggesting stuff that doesn't work, and I need to keep resetting the context.
I am often able to go in, and see what the issue is, but that's almost worthless. The most productive thing that I can do, is tell the LLM what is the problem, on the output end, and ask it to review and fix. I can highlight possible causes, but it often finds corner cases that I miss. I have to be careful not to be too dictatorial.
It's frustrating, as the LLM is like a junior programmer, but I can make suggestions that radically improve the result, and the total time is reduced drastically. I have gotten done, in about two hours, what might have taken all day.
Again, I appreciate the article very much and I'm glad the other comments are on the article's content.
Mar 30, 2026
Imagine you're a new assistant professor at a research university. You just got the job, you just got a small pot of startup funding, and you just hired your first two PhD students: Alice and Bob. You're in astrophysics. This is the beginning of everything.
You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year, because they don't know what they're doing yet, and that's the point. The project isn't the deliverable. The project is the vehicle. The deliverable is the scientist that comes out the other end.
Alice's project is to build an analysis pipeline for measuring a particular statistical signature in galaxy clustering data. Bob's is something similar in scope and difficulty, a different signal, a different dataset, the same basic arc of learning. You send them each a few papers to read, point them at some publicly available data, and tell them to start by reproducing a known result. Then you wait.
The academic year unfolds the way academic years do. You have weekly meetings with each student. Alice gets stuck on the coordinate system. Bob can't get his likelihood function to converge. Alice writes a plotting script that produces garbage. Bob misreads a sign convention in a key paper and spends two weeks chasing a factor-of-two error. You give them both similar feedback: read the paper again, check your units, try printing the intermediate output, think about what the answer should look like before you look at what the code gives you. Normal things. The kind of things you say fifty times a year and never remember saying.
By summer, both students have finished. Both papers are solid. Not groundbreaking, not going to change the field, but correct, useful, and publishable. Both go through a round of minor revisions at a decent journal and come out the other side. A perfectly ordinary outcome. The kind of outcome that the entire apparatus of academic training is designed to produce.
But Bob has a secret.
Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.
Here's where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist, they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that can't be.
It gets worse. The majority of PhD students will leave academia within a few years of finishing. Everyone knows this. The department knows it, the funding body knows it, the supervisor probably knows it too even if nobody says it out loud. Which means that, from the institution's perspective, the question of whether Alice or Bob becomes a better scientist is largely someone else's problem. The department needs papers, because papers justify funding, and funding justifies the department. The student is the means of production. Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant. The incentive structure doesn't just fail to distinguish between Alice and Bob. It has no reason to try.
This is the part where I'd like to tell you the system is broken. It isn't. It's working exactly as designed.
David Hogg, in his white paper, says something that cuts against this institutional logic so sharply that I'm surprised more people aren't talking about it. He argues that in astrophysics, people are always the ends, never the means. When we hire a graduate student to work on a project, it should not be because we need that specific result. It should be because the student will benefit from doing that work. This sounds idealistic until you think about what astrophysics actually is. Nobody's life depends on the precise value of the Hubble constant. No policy changes if the age of the Universe turns out to be 13.77 billion years instead of 13.79. Unlike medicine, where a cure for Alzheimer's would be invaluable regardless of whether a human or an AI discovered it, astrophysics has no clinical output. The results, in a strict practical sense, don't matter. What matters is the process of getting them: the development and application of methods, the training of minds, the creation of people who know how to think about hard problems. If you hand that process to a machine, you haven't accelerated science. You've removed the only part of it that anyone actually needed.
That's a hard sell to a funding agency, admittedly.
Which brings us back to Alice and Bob, and what actually happened to each of them during that year. Alice can now do things. She can open a paper she's never seen before and, with effort, follow the argument. She can write a likelihood function from scratch. She can stare at a plot and know, before checking, that something is wrong with the normalization. She spent a year building a structure inside her own head, and that structure is hers now, permanently, portable, independent of any tool or subscription. Bob has none of this. Take away the agent, and Bob is still a first-year student who hasn't started yet. The year happened around him but not inside him. He shipped a product, but he didn't learn a trade.
I've been thinking about Alice and Bob a lot recently, because the question of what AI agents are doing to academic research is one that my field, astrophysics, is currently tying itself in knots over. Several people I respect have written thoughtful pieces about it. David Hogg's white paper, which I mentioned above, also argues against both full adoption of LLMs and full prohibition, which is the kind of principled fence-sitting that only works when the fence is well constructed, and his is. Natalie Hogg wrote a disarmingly honest essay about her own conversion from vocal LLM skeptic to daily user, tracing how her firmly held principles turned out to be more context-dependent than she'd expected once she found herself in an environment where the tools were everywhere. Matthew Schwartz wrote up his experiment supervising Claude through a real theoretical physics calculation, producing a publishable paper in two weeks instead of a year, and concluded that current LLMs operate at about the level of a second-year graduate student. Each of these pieces is interesting. Each captures a real facet of the problem. None of them quite lands on the thing that keeps me up at night.
Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. It produced verification documents that verified nothing. It asserted results without derivation. It simplified formulas based on patterns from other problems instead of working through the specifics of the problem at hand. Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. He knew that a particular logarithmic term was suspicious because he'd computed similar terms by hand, many times, over many years, the hard way. The experiment succeeded because the human supervisor had done the grunt work, years ago, that the machine is now supposedly liberating us from. If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.
There's a common rebuttal to this, and I hear it constantly. "Just wait," people say. "In a few months, in a year, the models will be better. They won't hallucinate. They won't fake plots. The problems you're describing are temporary." I've been hearing "just wait" since 2023. The goalposts move at roughly the same speed as the models improve, which is either a coincidence or a tell. But set that aside. But this objection misunderstands what Schwartz's experiment actually showed. The models are already powerful enough to produce publishable results under competent supervision. That's not the bottleneck. The bottleneck is the supervision. Stronger models won't eliminate the need for a human who understands the physics; they'll just broaden the range of problems that a supervised agent can tackle. The supervisor still needs to know what the answer should look like, still needs to know which checks to demand, still needs to have the instinct that something is off before they can articulate why. That instinct doesn't come from a subscription. It comes from years of failing at exactly the kind of work that people keep calling grunt work. Making the models smarter doesn't solve the problem. It makes the problem harder to see.
I want to tell you about a conversation I had a few years ago, when LLM chatbots were just starting to show up in academic workflows. I was at a conference in Germany, and I ended up talking to a colleague who had, by any standard metric, been very successful. Big grants. Influential papers. The kind of CV that makes a hiring committee nod approvingly. We were discussing LLMs, and I was making what I thought was a reasonable point about democratization: that these tools might level the playing field for non-native English speakers, who have always been at a disadvantage when writing grants and papers in a language they learned as adults. My colleague became visibly agitated. He wasn't interested in the democratization angle. He wasn't interested in the environmental cost. He was, when you stripped away the intellectual framing, afraid. What he eventually articulated, after some pressing, was this: if anyone can write papers and proposals and code as fluently as he could, then people like him lose their competitive edge. The concern was not about science. The concern was about status. Specifically, his.
I lost track of this colleague for a while. Recently I noticed his GitHub profile. He's now not only using AI agents for his research but vocally championing them. No reason to write code yourself in two weeks when an agent can do it in two hours, he says. I don't think he's wrong about the efficiency. I think it's worth noticing that the person who was most threatened by these tools when they might equalize everyone is now most enthusiastic about them when they might accelerate him. Funny how that works.
The phrase he used that day in Germany has stuck with me, though. He said that "LLMs will take away what's so great about science." At the time, I thought he was just talking about his own competitive edge, his fluency as a native English speaker, his ability to write fast and publish often. And he was. But I've come to think the phrase itself was more right than he knew, even if his reasons for saying it were mostly self-interested. What's great about science is its people. The slow, stubborn, sometimes painful process by which a confused student becomes an independent thinker. If we use these tools to bypass that process in favor of faster output, we don't just risk taking away what's great about science. We take away the only part of it that wasn't replaceable in the first place.
The discourse around LLMs in science tends to cluster at two poles that David Hogg identifies cleanly: let-them-cook, in which we hand the reins to the machines and become curators of their output, and ban-and-punish, in which we pretend it's 2019 and prosecute anyone caught prompting. Both are bad. Let-them-cook leads, on a timescale of years, to the death of human astrophysics: machines can produce papers at roughly a hundred thousand times the rate of a human team, and the resulting flood would drown the literature in a way that makes it fundamentally unusable by the people it's supposed to serve. Ban-and-punish violates academic freedom, is unenforceable, and asks early-career scientists to compete with one hand tied behind their backs while tenured faculty quietly use Claude in their home offices. Neither policy is serious. Both are mostly projection.
But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.
Frank Herbert (yeah, I know I'm a nerd), in God Emperor of Dune, has a character observe: "What do such machines really do? They increase the number of things we can do without thinking. Things we do without thinking; there's the real danger." Herbert was writing science fiction. I'm writing about my office. The distance between those two things has gotten uncomfortably small.
I should be honest about the context I'm writing from, because this essay would be obnoxious coming from someone who's never touched an LLM. I use AI agents regularly, and so do most of the people in my research group. The colleagues I work with produce solid results with these tools. But when you look at how they use them, there's a pattern: they know what the code should do before they ask the agent to write it. They know what the paper should say before they let it help with the phrasing. They can explain every function, every parameter, every modeling choice, because they built that knowledge over years of doing things the slow way. If every AI company went bankrupt tomorrow, these people would be slower. They would not be lost. They came to the tools after the training, not instead of it. That sequence matters more than anything else in this conversation.
When I see junior PhD students entering the field now, I see something different. I see students who reach for the agent before they reach for the textbook. Who ask Claude to explain a paper instead of reading it. Who ask Claude to implement a mathematical model in Python instead of trying, failing, staring at the error message, failing again, and eventually understanding not just the model but the dozen adjacent things they had to learn in order to get it working. The failures are the curriculum. The error messages are the syllabus. Every hour you spend confused is an hour you spend building the infrastructure inside your own head that will eventually let you do original work. There is no shortcut through that process that doesn't leave you diminished on the other side.
People call this friction "grunt work." Schwartz uses exactly that phrase, and he's right that LLMs can remove it. What he doesn't say, because he already has decades of hard-won intuition and doesn't need the grunt work anymore, is that for someone who doesn't yet have that intuition, the grunt work is the work. The boring parts and the important parts are tangled together in a way that you can't separate in advance. You don't know which afternoon of debugging was the one that taught you something fundamental about your data until three years later, when you're working on a completely different problem and the insight surfaces. Serendipity doesn't come from efficiency. It comes from spending time in the space where the problem lives, getting your hands dirty, making mistakes that nobody asked you to make and learning things nobody assigned you to learn.
The strange thing is that we already know this. We have always known this. Every physics textbook ever written comes with exercises at the end of each chapter, and every physics professor who has ever stood in front of a lecture hall has said the same thing: you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI agents, we've collectively decided that maybe this time it's different. That maybe nodding at Claude's output is a substitute for doing the calculation yourself. It isn't. We knew that before LLMs existed. We seem to have forgotten it the moment they became convenient.
Centuries of pedagogy, defeated by a chat window.
This is the distinction that I think the current debate keeps missing. Using an LLM as a sounding board: fine. Using it as a syntax translator when you know what you want to say but can't remember the exact Matplotlib keyword: fine. Using it to look up a BibTeX formatting convention so you don't have to wade through Stack Overflow: fine. In all of these cases, the human is the architect. The machine holds the dictionary. The thinking has already been done, and the tool is just smoothing the last mile of execution. But the moment you use the machine to bypass the thinking itself, to let it make the methodological choices, to let it decide what the data means, to let it write the argument while you nod along, you have crossed a line that is very difficult to see and very difficult to uncross. You haven't saved time. You've forfeited the experience that the time was supposed to give you.
Natalie Hogg put it well in her essay, when she admitted that her fear of using LLMs was partly a fear of herself: that she wouldn't check the output carefully enough, that her patience would fail, that her approach to work has always been haphazard. That kind of honesty is rare in these discussions, and it matters. The failure mode isn't malice. It's convenience. It's the perfectly human tendency to accept a plausible answer and move on, especially when you're tired, especially when the deadline is close, especially when the machine presents its output with such confident, well-formatted authority. The problem isn't that we'll decide to stop thinking. The problem is that we'll barely notice when we do.
I'm not arguing that LLMs should be banned from research. That would be stupid, and it would be a position I don't hold, given that I used one this morning. I'm arguing that the way we use them matters more than whether we use them, and that the distinction between tool use and cognitive outsourcing is the single most important line in this entire conversation, and that almost nobody is drawing it clearly. Schwartz can use Claude to write a paper because Schwartz already knows the physics. His decades of experience are the immune system that catches Claude's hallucinations. A first-year student using the same tool, on the same problem, with the same supervisor giving the same feedback, produces the same output with none of the understanding. The paper looks identical. The scientist doesn't.
And here is where I have to be fair to Bob, because Bob isn't stupid. Bob is responding rationally to the incentives he's been given. Academia is cutthroat. The publish-or-perish pressure is not a metaphor; it is the literal mechanism by which careers are made or ended. Long gone are the days when a single, carefully reasoned monograph could get you through a PhD and into a good postdoc. Academic hiring now rewards publication volume. The more papers you produce during your PhD, the better your chances of landing a competitive postdoc, which improves your chances of a good fellowship, which improves your chances of a tenure-track position, each step compounding the last (so many levels, almost like a pyramid). So why wouldn't a first-year student outsource their thinking to an agent, if doing so means three papers instead of one? The logic is airtight, right up until the moment it isn't. Because the same career ladder that rewards early publication volume eventually demands something that no agent can provide: the ability to identify a good problem, to know when a result smells wrong, to supervise someone else's work with the confidence that comes only from having done it yourself. You can't skip the first five years of learning and expect to survive the next twenty. There is no avoiding the publish-or-perish race if you want an academic career. But there is a balance to be struck, and it requires the one thing that is hardest to do when you're twenty-four and anxious about your future: prioritizing long-term understanding over short-term output. Nobody has ever been good at that. I'm not sure why we'd start now.
Five years from now, Alice will be writing her own grant proposals, choosing her own problems, supervising her own students. She'll know what questions to ask because she spent a year learning the hard way what happens when you ask the wrong ones. She'll be able to sit with a new dataset and feel, in her gut, when something is off, because she's developed the intuition that only comes from doing the work yourself, from the tedious hours of debugging, from the afternoons wasted chasing sign errors, from the slow accumulation of tacit knowledge that no summary can transmit.
Bob will be fine. He'll have a good CV. He'll probably have a job. He'll use whatever the 2031 version of Claude is, and he'll produce results, and those results will look like science.
I'm not worried about the machines. The machines are fine. I'm worried about us.
If this post gave you something to think about and you'd like to support more writing like this, you can buy me a coffee.
References:
D. W. Hogg, "Why do we do astrophysics?", arXiv:2602.10181, February 2026.
N. B. Hogg, "Find the stable and pull out the bolt", February 2026. Available at nataliebhogg.com.
M. Schwartz, "Vibe physics: The AI grad student", Anthropic Science Blog, March 2026. Available at anthropic.com/research/vibe-physics.
While we have a lot of abstractions that solve some subproblems, there still need to connect those solutions to solve the main problem. And there’s a point where this combination becomes its own technical challenge. And the skill that is needed is the same one as solving simpler problems with common algorithms.
I have literally never run into this in my career..challenges have always been something to help me grow.
If Bob is going to spend $500 in tokens for something I can do for $50.
I think Bob is not going to stay long in lawn mowing market driving a bulldozer.
Do you have a solution for me? How does the market value things that don't yet exist in this brave new world?
I would take that bet on the side of the wet meat. In the future, every AI will be an ad executive. At least the meat programming won't be preloaded to sell ads every N tokens.
Which is more work, and less fun, than doing it myself. No thanks.
There is no evidence for this. The claims that API is "profitable on inference" are all hearsay. Despite the fact that any AI executive could immediately dismiss the misconception by merely making a public statement beholden to SEC regulation, they don't.
> Models are still improving/getting cheaper
The diminishing returns have set in for quality, and for a while now that increased quality has come at the cost of massive increases in token burn, it's not getting cheaper.
Worse yet, we're in an energy crisis. Iran has threatened to strike critical oil infrastructure, and repairs would take years.
AI is going to get significantly more expensive, soon.
My 40-some-odd years on this planet tells me the answer is yes.
As fewer know what good food tastes like, the entire market will enshitify towards lower and lower calibre food.
We already see this with, for example, fruits in cold climates. I've known people who have only ever bought them from the supermarket, then tried them at a farmers when they're in season for 2 weeks. The look of astonishment on their faces, at the flavour, is quite telling. They simply had no idea how dry, flavourless supermarket fruit is.
Nothing beats an apple picked just before you eat it.
(For reference, produce shipped to supermarkets is often picked, even locally, before being entirely ripe. It last longer, and handles shipping better, than a perfectly ripe fruit.)
The same will be true of LLMs. They're already out of "new things" to train on. I question that they'll ever learn new languages, who will they observe to train on? What does it matter if the code is unreadable by humans regardless?
And this is the real danger. Eventually, we'll have entire coding languages that are just weird, incomprehensible, tailored to LLMs, maybe even a language written by an LLM.
What then? Who will be able to decipher such gibberish?
Literally all true advancement will stop, for LLMs never invent, they only mimic.
As long as that’s the case, those that create crap will thrive.
Pretty basic, and long predates LLMs.
Doesn’t sound like these tools should be used to write scientific papers for example and they seem to bamboozle people far more than help them.
I didn't get to be a senior engineer by immediately being able to solve novel problems. I can now solve novel problems because I spent untold hours solving trivial ones.
As of March 2026, OpenAI generates annual revenue exceeding $12 billion. However, the costs of running ChatGPT are around $17 billion a year.
Source: https://searchlab.nl/en/statistics/chatgpt-statistics-2026
Writing assembly is probably completely irrelevant. You should still know how programming language concepts map to basic operations though. Simple things like strict field offsets, calling conventions, function calls, dynamic linking, etc.
This is false. There absolutely are people that fall back on older tools when fancy tools fail. You will find such people in the military, in emergency services, in agriculture, generally in areas where getting the job done matters.
Perhaps you're unfamiliar.
They other week I finished putting holes in fence posts with a bit and brace as there was no fuel for the generator to run corded electric drills and the rechargable batteries were dead.
Ukrainians, and others, need to fall back on no GPS available strategies and have done so for a few years now.
etc.
But to even *know* what is more useful, it is crucial to have walked the walk. Otherwise we will all end up with a bunch of people trying to reinvent the wheel, over and over again, like JavaScript "developers" who keep reinventing frameworks every six months.
> which nobody would buy for any other tool
I don't know about you, but I wasn't allowed to use calculators in my calculus classes precisely to learn the concepts properly. "Calculators are for those who know how to do it by hand" was something I heard a lot from my professors.
LLMs, the way they typically get used, are solely to save time by handing over nearly the entire process. In that sense acuity can't remain intact, even less so improving over time.
The determining factor is always "did I come up with this tool". Somehow, subsequent generations always manage to find their own competencies (which, to be fair, may be different).
This isn't guaranteed to play out, but it should be the default expectation until we actually see greatly diminishing outputs at the frontier of science, engineering, etc.
But we might see a lot more specialization as a result
I wouldn't count on that because even if it happens, we don't know when it ill happen, and it's one of those things where how close it looks to be is no indication of how close it actually is. We could just as easily spend the next 100 years being 10 years away from agi. Just look at fusion power, self driving cars, etc.
The article addresses this, because, well... no we aren't. Maybe we are. But it's far from clear that we're not moving toward a plateau in what these agents can do.
> Whether a human does actual work or not isn't particularly exciting to a market.
You seem to be convinced these AI agents will continue to improve without bound, so I think this is where the disconnect lies. Some of us (including the article author) are more skeptical. The market values work actually getting done. If the AIs have limits, and the humans driving them no longer have the capability to surpass those limits on their own, then people who have learned the hard way, without relying so much on an AI, will have an advantage in the market.
I already find myself getting lazy as a software developer, having an LLM verify my work, rather than going through the process of really thinking it through myself. I can feel that part of my skills atrophying. Now consider someone who has never developed those skills in the first place, because the LLM has done it for them. What happens when the LLM does a bad job of it? They'll have no idea. I still do, at least.
Maybe someday the AIs will be so capable that it won't matter. They'll be smarter and more through and be able to do more, and do it correctly, than even the most experienced person in the field. But I don't think that's even close to a certainty.
do you have any evidence for that, though? Besides marketing claims, I mean.
It doesn't matter if Bob can be normal. There was no point to him being paid to be on the program.
From the article:
If you hand that process to a machine, you haven't accelerated science. You've removed the only part of it that anyone actually needed.
> There's a common rebuttal to this, and I hear it constantly. "Just wait," people say. "In a few months, in a year, the models will be better. They won't hallucinate. They won't fake plots. The problems you're describing are temporary." I've been hearing "just wait" since 2023.
We're not trending towards superintelligence with these AIs. We're trending towards (and, in fact, have already reached) superintelligence with computers in general, but LLM agents are among the least capable known algorithms for the majority of tasks we get them to do. The problem, as it usually is, is that most people don't have access to the fruits of obscure research projects.
Untrained children write better code than the most sophisticated LLMs, without even noticing they're doing anything special.
Human nature says that Bob will skim over and trust the parts that he doesn't understand as long as he gets output that looks like he expects it to look, and that's extremely dangerous.
That's irrelevant to the goal of the program - they care. Once they stop caring, they'd shut that program down.
Maybe it would be replaced with a new program that has the goal of delivering Bobs+AI, but what would be the point? I mean, the article explained in depth that there is no market for the results currently, so what would be the point of efficiently generating those results?
The market currently does not want the results, so replacing the current program with something that produces Bobs+AI would be for... what, exactly?
But they would be outdated, right?
Would an agent that can only code in COBOL would be as useful today?
It sounds entirely reasonable and moderate to me.
> Another reason that a no-exceptions policy is important: If students with disabilities are permitted to use laptops and AI, a significant percentage of other students will most likely find a way to get the same allowances, rendering the ban useless. I witnessed this time and again when I was a professor—students without disabilities finding ways to use disability accommodations for their own benefit. Professors I know who are still in the classroom have told me that this remains a serious problem.
This would be a huge problem for students with severe and uncorrectable visual impairments. People with degenerative eye diseases already have to relearn how to do every single thing in their life over and over and over. What works for them today will inevitably fail, and they have to start over.
But physical impairments like this are also difficult to fake and easy to discern accurately. It's already the case that disability services at many universities only grants you accommodations that have something to do with your actual condition.
There are also some things that are just difficult to accommodate without technology. For instance, my sister physically cannot read paper. Paper is not capable of contrast ratios that work for her. The only things she can even sometimes read are OLED screens in dark mode, with absolutely black backgrounds; she requires an extremely high contrast ratio. She doesn't know braille (which most blind people don't, these days) because she was not blind as a little girl.
Committed cheaters will be able to cheat anyway; contemporary AI is great at OCR. You'll successfully punish honest disabled people with a policy like this but you won't stop serious cheaters.
I still wrote bugs. I'd bet that my bugs/LoC has remained static if not decreased with AI usage.
What I do see is more bugs, because the LoC denominator has increased.
What I align myself towards is that becoming senior was never about knowing the entire standard library, it was about knowing when to use the standard library. I spent a decade building Taste by butting my head into walls. This new AI thing just requires more Taste. When to point Claude towards a bug report and tell it to auto-merge a PR and when to walk through code-gen function by function.
No, this is impossible unless Bob is presenting at each weekly meeting simply the output of the LLM and feeding the tutor's feedback straight into it. For a total of 10 minutes work per week, and the tutor would notice straight away at least for the lack of progress.
No, the article specifies that Bob actually works with the LLM, doesn't just delegate. He asks the agent to summarise, to explain, and to help with bug fixing. You could easily argue that Bob, having such an AI tutor available 24/7, can develop understanding much faster. He certainly won't waste his time with small details of python syntax (though working with a "coding expert" will make his code much cleaner and more advanced).
The AI can help with that too. Ask it "How would one think about this issue, to prove that what was done here is correct?" and it will come up with somethimg to help you ground that understanding intuitively.
Starting with week three the overall structure of the code base is done, but the actual implementation is lacking. Whenever I run out of tokens I just started programming by hand again. As you keep doing this, the code base becomes ever more familiar to you until you're at a point where you tear down the AI scaffolding in the places where it is lacking and keep it where it makes no difference.
GP says that they have to come back tomorrow and edit the code to fix something. That's a verification step: if you can do that (even with some effort) you understand why the AI did what it did. This is not some completely new domain where what you wrote would apply very clearly, it's just a codebase that GP is supposed to be familiar with already!
Are people just so addicted to doomscrolling or whatever that they just can’t spend a few minutes of their day doing some type of human activity?
If humans can prove that bespoke human code brings value, it'll stick around. I expect that the cases where this will be true will just gradually erode over time.
Acting like workers have a meaningful choice where their director had to spend $30million on SalesForce licenses when people had no say in the matter just further proves that the last frontier for democracy will be in the private sector where we can liberate the workers against their tyrants and actually run the companies effectively rather than relying on the government to kneecap their competitors or straight up ignoring the law to the detriment of society + the planet.
I don't actually think the article refutes this. But the AI needs to be in the hands of someone who can review the code (or astrophysics paper), notice and understand issues, and tell the AI what changes to make. Rinse, repeat. It's still probably faster than writing all the code yourself (but that doesn't mean you can fire all your engineers).
The question is, how do you become the person who can effectively review AI code without actually writing code without an AI? I'd argue you basically can't.
The proposal that everyone pay for college until they are in their 40s doesn’t seem viable.
It has always been extremely annoying to fight with people who mistake the ability of building or engaging with complicated systems (like your regex) with competency.
I work in building AI for a very complex application, and I used to be in the top 0.1% of Python programmers (by one metric) at my previous FAANG job, and Claude has completely removed any barriers I have between thinking and achieving. I have achieved internal SOTA for my company, alone, in 1 week, doing something that previously would have taken me months of work. Did I have to check that the AI did everything correctly? Sure. But I did that after saving months of implementation time so it was very worth it.
We're now in the age of being ideas-bound instead of implementation-bound.
And it’s not strictly speaking university we’re talking about. The way we understand work is going to fundamentally change. And we’re not going to value the people who use LLMs to get 1x done.
But yes, university was always about how much work you put into it, and LLM’s are going to make that 10x more obvious.
The point is the Bob and Alice comparison is already a straw man, but I do squarely believe it’s the people with the best mental models and not the people who “get AI” who will win the new world. If you’re curious and good at developing mental models, you can learn “AI” in a week. But if you’re curious and good at developing mental models you’ve probably already lapped both Bob and Alice
So we can glue that together a bit faster, great.
What if we also stop producing new open source, frameworks, libraries, etc.
What about stories like Tailwind?
I can only speak to the engineering context not academia, but I would expect there’s similar patterns of busywork. Even the blog authors admit to using AI. AI cant replace thinking but it can replace menial labor, which is prevalent even among “knowledge workers”.
ffmpeg disagrees.
More broadly, though, it’s a logical step if you want to go from “here’s how PN junctions work” to “let’s run code on a microprocessor.” There was a game up here yesterday about building a GPU, in the same vein of nand2tetris, Turing Complete, etc. I find those quite fun, and if you wanted to do something like Ben Eater’s 8-bit computer, it would probably make sense to continue with assembly before going into C, and then a higher-level language.
The people who make the calculator analogy are already victims of the missing rung problem and they aren't even able to comprehend what they're lacking. That's where the future of LLM overuse will take us.
I feel like people tend to forget that among the many things LLMs can do these days, “using a search engine” is among them. In fact, they use them better than the majority of people do!
The conversation people think they’re having here and the conversation that actually needs to be had are two entirely different conversations.
> I don’t know about you, but I wasn’t allowed to use calculators in my calculus classes precisely to learn the concepts properly. “Calculators are for those who know how to do it by hand” was something I heard a lot from my professors.
Suppose I never learned how to derive a function. I don’t even know what a function is. I have no idea how to do make one, write one, or what it even does. So I start gathering knowledge:
- A function is some math that allows you to draw a picture of how a number develops if you do that math on it.
- A derivative is a function that you feed a function and a number into, and then it tells you something about what that function is doing to that number at that number.
- “What it’s doing” specifically means not the result of the math for that particular number, but the results for the immediate other numbers behind and in front of it.
- This can tell us about how the function works.
Now I go tell ClaudeGPTimini “hey, can you derive f(x) at 5 so that we can figure out where it came from and where it goes from there?”, and it gives me a result.
I’ve now ostensibly understood what a derivative does and what it’s used for, yet I have zero idea how to mathematically do it. Does that make any results I gain from this intuitive understanding any less valuable?
What I’ll give you is this: if I knew exactly how the math worked, then it would be far easier for me to instantly spot any errors ClaudeGPTimini produced. And the understanding of functions and derivatives outlined above may be simplistic in some places (intentionally so), in ways that may break it in certain edge cases. But that only matters if I take its output at face value. If I get a general understanding of something and run a test with it, I’ll generally have some sort of hypothesis of what kind of result I’m expecting, given that my understanding is correct. If I know that a lot of unknown unknowns exist around a thing I’m working with, then I also know that unexpected results, as well as expected ones, require more thorough verification. Science is what happens when you expect something, test something, and get a result - expected OR unexpected - and then systematically rule out that anything other than the thing you’re testing has had an effect on that result.
This is not a problem with LLMs. It’s a thing we should’ve started teaching in schools decades ago: how to understand that there are things you don’t understand. In my view, the vast majority of problems plaguing us as a species lies in this fundamental thing that far too many people are just never taught the concept of.
It depends on the task though. If you are in a similar scenario as with your fence posts and want to edit computer programs, you can't. (Not even with xkcd's magnetic needle and a steady hand). ;-)
As technology marches on it seems inevitable that we will get increasingly large and frequent knowledge gaps. Otherwise progress would stop - we need the giant shoulders to stand on.
How many people in the world can recreate a ASML lithography machine vs how many people are surviving by doing something that requires that machine to exist?
Well, we still make people calculate manually for many years, and we still make people listen to lectures instead of just reading.
But will we still have people to go through years of manual coding? I guess in the future we will force them, at least if we want to keep people competent, just like the other things you mentioned. Currently you do that on the job, in the future people wont do that on the job so they will be expected to do it as a part of their education.
In a sense, I think you are right. We are currently going through a period of transition that values some skills and devalues others. The people who see huge productivity gains because they don't have to do the meaningless grunt work are enthusiastic about that. The people who did not come up with the tool are quick to point out pitfalls.
The thing is, the naysayers aren't wrong since the path we choose to follow will determine the outcome of using the technology. Using it to sift through papers to figure out what is worth reading in depth is useful. Using it to help us understand difficult points in a paper is useful. On the other hand, using it as a replacement for reading the papers is counterproductive. It is replacing what the author said with what a machine "thinks" an author said. That may get rid of unnecessary verbosity, but it is almost certainly stripping away necessary details as well.
My university days were spent studying astrophysics. It was long ago, but the struggles with technology handling data were similar. There were debates between older faculty who were fine with computers, as long as researchers were there to supervise the analysis every step of the way, and new faculty, who needed computers to take raw data to reduced results without human intervention. The reason was, as always, productivity. People could not handle the massive amounts of data being generated by the new generation of sensors or systematic large scale surveys if they had to intervene any step of the way. At a basic level, you couldn't figure out whether it was a garbage-in, garbage-out type scenario because no one had the time to look at the inputs. (I mean no time in an absolute sense. There was too much data.) At a deeper level, you couldn't even tell if the data processing steps were valid unless there was something obviously wrong with the data. Sure, the code looked fine. If the code did what we expected of it, mathematically, it would be fine. But there were occasions where I had to point out that the computer isn't working how they thought it was.
It was a debate in which both sides were right. You couldn't make scientific progress at a useful pace without sticking computers in the middle and without computers taking over the grunt work. On the other hand, the machine cannot be used as a replacement for the grunt work of understanding, may that involves reading papers or analyzing the code from the perspective of a computer scientist (rather than a mathematician).
Whatever models suck at, we can pour money into making them do better. It's very cut and dry. The squirrely bit is how that contributes to "general intelligence" and whether the models are progressing towards overall autonomy due to our changes. That mostly matters for the AGI mouthbreathers though, people doing actual work just care that the models have improved.
Calculators are deterministically correct given the right input. It does not require expert judgement on whether an answer they gave is reasonable or not.
As someone who uses LLMs all day for coding, and who regularly bumps against the boundaries of what they're capable of, that's very much not the case. The only reason I can use them effectively is because I know what good software looks like and when to drop down to more explicit instructions.
Catching an LLM hallucinating often takes a basic understanding of what the answer should look like before asking the question.
As it happens, we generally don't let people use calculators while learning arithmetic. We make children spend years using pencil and paper to do what a calculator could in seconds.
> “Most ingenious Theuth, one man has the ability to beget arts, but the ability to judge of their usefulness or harmfulness to their users belongs to another; [275a] and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess.
> "For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."
Sounds to me like he was spot on.
It is a debatable topic, and I agree with you that it's unclear whether we will hit the wall or not at some point. But one point I want to mention is that at the time when the AI agents were only conceived and the most popular type of """AI""" was LLM-based chatbot, it also seemed that we're approaching some kind of plateau in their performance. Then "agents" appeared, and this plateau, the wall we're likely to hit at some point, the boundary was pushed further. I don't know (who knows at all?) how far away we can push the boundaries, but who knows what comes next? Who knows, for example, when a completely new architecture different from Transformers will come out and be adopted everywhere, which will allow for something new? Future is uncertain. We may hit the wall this year, or we may not hit it in the next 10-20 years. It is, indeed, unclear.
Once continuous learning is solved, I predict the problem addressed by TFA to become orders of magnitude bigger: What's the motivation for anyone to teach a person if an LLM can learn it much faster, will work for you forever, and won't take any sick days or consider changing careers?
Yeah, I'm surprised at the number of people who read the article and came away with the conclusion that the program was designed to churn deliverables, and then they conclude that it doesn't matter if Bob can only function with an AI holding his hand, because he can still deliver.
That isn't the output of the program; the output is an Alice. That's the point of the program. They don't want the results generated by Alice, they want the final Alice.
Outdated compared to what? In your counterfactual, VC funded agents don't exist anymore, no?
Your argument, if I understand it correctly, is that they might somehow go away entirely when VC funding dries up, when more realistically they'll probably at most become twice as expensive or regress half a year in performance.
I agree, and I'd go a step further:
You can be the absolute best coder in the world, the fastest and most accurate code reviewer ever to live, and AI still produces bad code so much faster than you can review that it will become a liability eventually no matter what
There is no amount of "LLM in a loop" "use a swarm of agents" or any other current trickery that fixes this because eventually some human needs to read and understand the code. All of it.
Any attempt to avoid reading and understanding the code means you have absolutely left the realm of quality software, no exceptions
Presumably you're also reading some kind of learning text about the Chinese language, so the sole source isn't just the LLM?
In my experience, asking an LLM to produce small examples of well-known things (or rather, things that are going to be talked about frequently in the training data, so generally basic or fundamental topics) tend to work fine, and is going to be at a level where you yourself can judge what's presented.
I think the real danger is when a person is prompting things they don't know how to verify for themselves, since then we're basically just rolling dice and hoping
Not so in plenty of other countries. Hopefully US reverses the anti-science trend before it's too late
You can access the full article at https://archive.is/zSJ13 (I know archive.is is kind of shady, but it works).
No-ones becomes a good mathematician without first learning to write simple proofs, and then later on more complex proof. It's the very stuff of the field itself.
Do you agree the different treatment is justified ? (Many do not). Or are you asking , so what if acuity is diminished so long as an LLM does the job equally well?
We had the same problem in the early days of calculators. Using a slide rule, you had to track the order of magnitude in your head; this habit let you spot a large class of errors (things that weren't even close to correct).
When calculators came on the scene, people who never used a slide rule would confidently accept answers that were wildly incorrect (example: a mole of ideal gas at STP is 22.4 liters. If you typo it as 2204, you get an answer that's off by roughly two orders of magnitude, say 0.0454 when it should be 4.46. Easy to spot if you know roughly what the answer should look like, but easy to miss if you don't).
There is a vast difference between “never learned the skill,” and “forgot the skill from lack of use.” I learned how to do long division in school, decades ago. I sat down and tried it last year, and found myself struggling, because I hadn’t needed to do it in such a long time.
Edit: let's look at a paper like Some Linear Transformations on Symmetric Functions Arising From a Formula of Thiel and Williams https://ecajournal.haifa.ac.il/Volume2023/ECA2023_S2A24.pdf and try and guess how many of trivial things were completely unneeded to write a paper like this.
Calculators are deterministic, but they are not necessarily correct. Consider 32-bit integer arithmetic:
30000000 * 1000 / 1000
30000000 / 1000 * 1000
Mathematically, they are identical. Computationally, the results are deterministic. On the other hand, the computer will produce different results. There are many other cases where the expected result is different from what a computer calculates.I went to school the next day and told my teacher that the calculator says that 10+10 is 14, so why does she say it's 20?
So she showed me on her calculator. She pressed the hex button and explained why it was 14.
I think a major problem with people's usage of LLMs is that they stop at 10+10=14. They don't question it or ask someone (even the LLM) to explain the answer.
Yes - specific faculties atrophied - I wouldn't dispute it. But the (most) relevant faculties for human flourishing change as a function of our tools and institutions.
My point is that getting into the weeds of writing CRUD software is not the only way to gain the ability to write complex algorithms, or to debug complex issues, or do performance optimization. It's only common because the stuff you make on the journey used to be economically valuable
I don't care how many terms you add to your Taylor series: your polynomial approximation of a sine wave is never going to be suitable for additive speech synthesis. Likewise, I don't care how good your predictive-text transformer model gets at instrumental NLP subtasks: it will never be a good programmer (except as far as it's a plagiarist). Just look at the Claude Code source code: if anyone's an expert in agentic AI development, it's the Claude people, and yet the codebase is utterly unmaintainable dogshit that shouldn't work and, on further inspection, doesn't work.
That's not to say that no computer program can write computer programs, but this computer program is well into the realm of diminishing returns.
If I would not type but speak this comment maybe 2 to 5 words would be wrong. For a human it is maybe 10% of that.
The only reason we somewhat made it work is due to the interdependence between labor and capital. Once that's broken, the wheels will start falling off.
Outdated compared to reality / humans, their knowledge cutoff is a year further behind every year they don't get updates. Humans continuously expands their knowledge, the models needs to keep up with that.
Only, replacing the guts of such a machine to contain a local LLM is damn easy today. Right now the battery mass required to power the device would be a giveaway, but inference is getting energetically cheaper.
> Colleges that are especially committed to maintaining this tech-free environment could require students to live on campus, so they can’t use AI tools at home undetected.
Just like my on-campus classmates never smoked weed or drank underage, I'm sure.
This sentence contains the entire point, and the easiest way to get there, as with many, many things, is to ask “why?”
If I use a calculator to find a logarithm, and I know what a logarithm is, then the answer the calculator gives me is perfectly useful and 100% substitutable for what I would have found if I'd calculated the logarithm myself.
If I use Claude to "build a login page", it will definitely build me a login page. But there's a very real chance that what it generated contains a security issue. If I'm an experienced engineer I can take a quick look and validate whether it does or whether it doesn't, but if I'm not, I've introduced real risk to my application.
> People would have said the same about graphing calculators or calculators before that. Socrates said the same thing about the written word.
If the conclusion now becomes “actually, Socrates was correct but it wasn’t that bad”, then why bring up Socrates in the first place?
> Mathematica has been able to do many integrals for decades and yet we still make students learn all the tricks to integrate by hand
But maybe, integrating by hand is still as big as ever in other parts of academia. Or were you thinking about high school? I'm fairly sure, that symbolic solving of integrals is treated as less important in education these days, than it was before digital computers, but I could be wrong. Mathematica's symbolic solve sure is very useful, but numeric solutions are what really makes the art of finding integrals much less relevant.
I could imagine that when the music stops, advancement of new frontier models slows or stops, but that doesn't remove any curent capabilities.
(And to be fair the way we duplicate efforts on building new frontier models looks indeed wasteful. Tho maybe we reach a point later where progress is no longer started from scratch)
P.S. I am well aware of all of the risks that agents brought. I'm speaking in terms of pure "maximum performance", so to speak.
That’s the stuff that ai is eating. The stuff I’m talking about (scaling orgs, maintaining a project long term, deciding what features to build or not build etc) is stuff very hard for ai
I saw interviews with young Americans on spring break and they were so utterly uninformed it was mind-blowing. Their priorities are getting drunk and getting laid while their country bombs a nation “into the stone ages”, according to their president. And it’s not their fault: they are the product of a media environment and education system designed for exactly this outcome.
It's equivalent to asking your friend to pick you up, and they arrive in a big vs small car. Maybe you needed a big car because you were going to move furniture, or maybe you don't care, oops either way.
agents might be better at it than people are, given the right structure
Calculators provide a deterministic solution to a well-defined task. LLMs don't.
Trump doesn't have to justify a single thing because the billionaires behind him know that every last bet is off and their very livelihoods are at risk, and his entire base of support up and down the chain are either complicit or fooled.
What the world does when they finally realize Democrats and Republicans are simply two sides of the vast apparatus suppressing the will of the people by any means necessary will be... spectacular.
If you send a kid to an elementary school, and they come back not having learned anything, do you blame the concept of elementary schools, or do you blame that particular school - perhaps a particular teacher _within_ that school?
Choosing a "better" language was not always an option, at least at the time. I was working with grad students who were managing huge datasets, sometimes for large simulations and sometimes from large surveys. They were using C. Some of the faculty may have used Fortran. C exposes you the vulgarities of the hardware, and I'm fairly certain Fortran does as well. They weren't going to use a calculator for those tasks, nor an interpreted language. Even if they wanted to choose another language, the choice of languages was limited by the machines they used. I've long since forgotten what the high performance cluster was running, but it wasn't Linux and it wasn't on Intel. They may have been able to license something like Mathematica for it, but that wasn't the type of computation they were doing.
I remember things very differently. Everyone cared about the Iraq war, gay-straight alliance was one of the most up and coming clubs, and political music was everywhere. Green Day had their big second wave with American Idiot, System Of A Down was on top of the world, Rock Against Bush was huge, anarcho-punk like Rise Against was getting big.
I'm not a teenager anymore obviously, so it's entirely possible I'm just missing it, but I've seen very little that compares to those sort of movements. On the other hand, most millennials I know are still wildly politically active.