This stands out to me, and speaks perhaps broader than the article itself? I’m sure this has been in the spotlight before, but well put for many areas I think.
This will basically become true for everything.
Many CTFs have switched to a dual-leaderboard format recently, one for "agentic teams," one for the rest. If all you care about is "learning" and imaginary internet points, you can just participate as a human team and adblock the AI scoreboard, and maybe lobby CTFTime into splitting their rankings as well.
The solution is just to make CTFs harder, but when do CTFs become too hard? Maybe the problem is that 'hard' CTFs are fundementally too 'simple' where it's just a logic chain and an exhaustive bruteforce towards a solution since there really are limited ways to express a solution in plain sight.
Or maybe human creativity has been exhausted and we're not so limitless as we thought. Only time will tell.
I had another idea spring to mind: we could hide two flags, one that could only be found by ai agents and not humans or tools written by humans.
Before when playing CTFs with my mates was usually sitting there for hours tackling a challenge until some other mate joined, had some look together and solved it with you together in 30 minutes which is the most rewarding learning experience. Nowadays mate joins in throws the clanker on it and solved it in 5 minntes. Asking on how it worked you always get the "yeah idk what it did, but who cares, here is the flag" response.
Same for creating challenges. Whenever I ask for writeups or if some people solved it differently I usually get the "yeah idk, clanker solved that one" response taking the fun out of it.
So yep, this CTF format is definitely dead. Mainly because the strong competitiveness and prices. This encourages people to cheese challenges and sometimes solving them differently was fine as you still had a creative out-of-the-box thinking moment, but nowadays with AI there is no brainpower needed, no cheesing needed, no human needed. As you mentioned, it's pay to win.
My two cents is that the 24/7 CTFs will get more attraction as the scoreboard doesn't matter there and simply doesn't give you any price.
I thought code golf would take longer for AIs because there's so little training data (it's more niche), but we're seeing AIs starting to match expert humans there too. Sucks because golf has been my favorite type of programming puzzle.
It's crazy how far AIs have come in problem solving ability.
Exceptions for cases where the acronym is just so well known that a lot of people don't even know what it stands for even though they know the concept well. I recall one corporate training I was sitting through and they used the term "Border Gateway Protocol" and it took me a half beat to think through "oh, you mean BGP?"
Thanks!
We’ve figured out the human replacement pipeline it seems, but we haven’t figured out the eduction part. LLMs can be wonderful teachers, but the temptation to just tell it ‘do it for me’ is almost impossible to resist.
still has no mention of AI, but that will likely change as they increasingly dominate competition.
As I don't know much about the CTF scene, I looked for other takes on this topic.
Here's an article from 2015 about how tool-assistance already changed CTFs:
> Individual skill will undoubtedly be a factor next year. But, I'm left wondering whether next year's DEFCON CTF will tell us anything more than how well-developed each team's tools are (and how well they can interpret the results).
https://fuzyll.com/2015/ctf-is-dead-long-live-ctf/
But there are quite a few recent (2026) articles with the same core message as in the original article, e.g., https://blog.includesecurity.com/2026/04/ctfs-in-the-ai-era/ or https://k3ng.xyz/blog/ctf-is-dead
And here's someone explaining how Claude Max allowed them to win CTFs:
> I had always been interested in CTF as one of the only ways people could compete and show off their skill in coding/problem solving on a global scale. It was just too difficult and didn't make sense for me to learn the fundamentals as an electrical engineer. As time went on, I got better and better, and it was hard to tell whether it was because of experience or if it was because of improvements in AI.
> I accomplished my goals, and for that reason I'm quitting CTF, at least for now. [...] I'd like to think I highlighted the problem before it became a bigger issue. So, how do we fix this? Teams and challenge authors losing motivation is not good. CTF dying is not good. AI bad. Or is it?
https://blog.krauq.com/post/ctf-is-dying-because-of-ai
The only article that saw LLMs as a non-negative force for CTFs was this one. Fittingly, it sounds like LLM output ("Let's be honest", "This is where things get interesting.") and only contains hallucinated references.
https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurit...
Well actually I get it. In cycling motor doping, putting a hidden engine into the bike, seems more offensive than regular doping. I think this is because there is a continuum from eating well to taking supplements to injecting stuff, but having a engine breaks a fundamental idea about cycling. Similar hacking is about cleverly abusing the rules.
You could even go so far that anything loaded on your computer is fair game, but not more than that (certain competitive programming competition for example allow unlimited amount of paper material - for CTFs you probably need much more than that, therefore electronic).
They may as well be the human equivalent to what LLMs currently are.
I do not mourn these people, as they’re usually the most arrogant types. I hope for their sake they adapt.
Not as easy logistically...
The same article talks about CTF skills as a way to learn about security best practices and separately a sport.
In reality it was all about learning an extremely important skillset (securing/attacking software and systems) that is getting automated.
The real thing the author seems to be frustrated about is AGI is coming in computationally verifiable domains first, and lot of his skillset was taken over in a big part.
I never got super into security but it gave me the confidence to play in the same field and lose the stupid aura I had that somehow "rich americans" would be better than me at everything because they had better universities or because of Hollywood or something.
Sad that another cool thing is lost to AI but I guess kids will learn in other ways.
My fear is that they never get to the level they need to be at to create good software even with the help of AI. So, although an expert with AI can create great software, that is not where we end up. In stead we will have vibe coded messes by people who barely have any grasp of what is going on.
(The author of the piece understands this; I think they're broadly right, though I think these games will find other ways to incentivize participation without the now-meaningless leaderboards.)
I just did a CTF where I was in the top 10. It was the first CTF I completed and I used AI because the rules permitted it. That said, I couldn’t solve all challenges.
But yes, it was significantly easier now than I last attempted one. Even manually solving with AI assisted assembly interpretation was much easier.
The matrix was always the same and the challenge was clearly designed so that the point was being able to read anything at all, not knowing how to invert a matrix, so I asked the creator what was up.
He told me that there were tools that would trace input values until they reached a comparison instruction, then print what they were compared against. Therefore it was necessary for every deobfuscation challenge to scramble the input in some way too complex for these tools to undo, before comparing it. Hence the multiplication by a pseudorandom matrix.
The point is, cheating tools aren't new.
This is like someone complaining that making machine parts has been ruined: Skillful craftsmen used to make them by hand using manual tools!
Nowadays the CAD/CAM/CNC cheaters have almost completely automated the whole thing. How is the next generation of craftsmen going to learn how to craft a gear by hand when the process of gear making has been reduced to pressing start on a CNC machine?!
See what I mean? Sorry, I think this article is just Luddite. I can empathize with the pain of your beloved craft basically being rendered obsolete by new technology, but the process can neither be stopped nor is it bad in general.
The manual skills you trained with CTF puzzles are now simply no longer relevant . (Field-specific) "AI orchestration" is the new cyber securtiy skill if LLMs really have become so good at this, and what the author used to do manually then has the same value as being able to craft a gear by hand.
In our own trainings we give (AI agents for security, and a graph masterclass), we ended up leaning into it. For example, we ship with a skills bundle. There are plus sides, like less code-forward participants can go further and are appreciating that, and less of a gap between high-level concepts and successful hands-on. But at the same time, manual work does build a lot of intuition & knowledge that gets missed in auto modes.
Explicit ELO measurements with some cheating detection. AI assistance wholly banned. As you climb the ELO ladder, detection gets more onerous. At top level during online events, anti cheating teams require the use of both monitoring software and multiple cameras.
Idea is that you can cheat pretty easily at the lowest levels but it gets less easy the higher you go. This allows for better feeding into the truly elite competitions.
I think chess’s very firm stance that AI is never allowed in competition (neither online nor in person), rather than CTF’s acceptance, was the right call.
On the other hand, CTFs are fundamentally a game and a competition which are supposed to be fun and compare and improve ones skill. So when I let an LLM generate the entire solution for me, what's the point anymore? I did not learn anything. I did not work for that place on the leaderboard, I just copied the solution. And worst of all, I did not have any fun. It's boring.
So how does using AI as a solver not feel like cheating?
What am I missing here?
I've seen that exact font and color scheme a dozen of times the past weeks.
It's an incredibly exciting time in security research in my humble old man opinion.
Think the cadence of new exploits is perhaps a good measure of that rather than subjective thoughts by anyone regardless of experience.
These models seems completely unbeatable only in the ads. There are 100+ times way someone puts Hindi Yoda talk In Morse Code and it goes nuts. The reason they are going to hard for PR Marketing on this is because they know it is a matter of time.
Do you publish it somewhere? Here's a sample my my js obfuscator output: https://gist.github.com/Trung0246/c8f30f1b3bb6a9f57b0d9be94d...
So, in fact, you must not beg to have authors include courtesy definitions for you. That's not reasonable. Instead, you should simply ask here, on the thread, without complaining about the article.
Basic rule: define every abbreviation when it is first used.
Personally I have never, ever heard that concept referred to by the initialism. Granted, it's almost never come up in my circles, so... shrug
More generally, not every piece of writing is meant for every audience. Like if someone writes a blog post about CTFs aimed at people who like CTFs, nobody in the target audience needs to have CTF explained to them. Ultimately HN is a link aggregator, but sometimes its a bit like eavesdropping on a conversation. When you are just listening in you don't get the full context sometimes.
I think you only wanted clarification of CTF (Capture the Flag) and not AI (Artificial Intelligence) and not GPT-4 (Generative Pre-Trained Transformer version 4) and not CLI (Command Line Interface) and not MCP (Model Context Protocol) and not LLM (Large Language Model)
Quoting TFA (The Fucking Article): “just adapt bro”
lol at the BGP example
> Rules that ask people not to use LLMs are ignored and almost impossible to enforce in open online events.
It's quite sad to see CTFs dying. I never had the time do seriously participate in CTFs, but I always respected those who did, as well as the people organizing these events.
The intent for most CTFs is to provide a meaningful challenge that concerns a single topic without introducing noise that wastes time. Of course a training exercise is easier to complete for an LLM.
We found that AI usage is basically guaranteed now, but certain challenge designs did thwart it. Challenges built with temporal visual elements made AI fall flat on its face, as it could not ingest/process the data fast enough to act on them in time. We also found that counterfactual challenges (ie. the result you get did not match what we suggested you'd get) made AI-assisted solve time slower compared to pure humans, indirectly penalizing over-reliance on AI. Multimodal challenges combining audio and visual elements were also very effective, but were not as accessible to players.
This paper gave us some ideas about designing those challenges: https://arxiv.org/pdf/2308.02950.
For our next event we figured out a way to thwart AI in our CTF: embed the CTF in a game engine. The loop essentially becomes something like this: Connect to a simulated access point in the game, the K8s cluster connects their attack container to a private network with the challenge box(es). Hacking the boxes doesn't render a flag, but rather changes in game state. AI did very poorly coping with this in our testing, as it can't derive the spatial state of the game world very well and it soft decouples the inductive reasoning loop it relies on to know if it is on the right track.
The downside to this approach is it is far more labor intensive for CTF organizers, and requires players to have a computer capable of running the game. We are also betting on AI to not advance enough by the time we ship to be able to just ingest the entire game state in realtime and close the loop that way.
In this context, CTF is almost exclusively referred to by the initialism, i think to help distinguish from other uses of the term.
Actively rude.
It's pretty fun. Or at least it was, back when you had some sense that your competitors were competing on an even playing field and just beat you because they were better than you.
I wouldn't say the name is a "gaming reference", it's just a descriptive name for a game.
Its a war game reference I guess?
It's a lot harder to detect cheating when your only trace is how fast someone submitted the string CTF{DUck1e_Pwned}
The reason LLMs can do CTFs so well is partially because the challenges are usually designed to avoid wasting time and to introduce a single concept without noise.
No we have not.
Well, they were ostensibly forcing functions... ten years ago everyone was paying the exchange student to do their homework and assignments for them, and that guy was paying his cousin back in his home country, but the whole thing is a bit more efficient now.
[0] Episode webpage: https://share.transistor.fm/s/31855e83
https://ldjam.com/events/ludum-dare/59/setidream/about-ai-ar...
For what it's worth, the non-AI-coded entries were still quite good relative to the winners, so it's not so obvious that AI use confers an unbeatable advantage.
So something like, "Frontier AI has broken the 'high school' or 'university' format"?
The hype surrounding AI is just pervasively exhausting: you've got the folks talking about an entire new age for humanity where we're shortly going to take over the entire universe. And you've got the folks talking about how our entire society is crumbling.
Education is one place folks seem to throw up their hands and say nothing can be done.
The fix is simple: students are to be evaluated on their performance in person. That's it.
Any other "collapse of education" isn't due to AI, it's something else.
Why so pedantic?
Within the framework of your analogy, it's like responding to someone active in DIY maker groups suddenly dealing with an influx of influencers in meetups showing off Chinese junk from Etsy to post on Tiktok, and accusing them of being a Luddite blinded by their zealous hatred of mass production -- both strangely abrasive and also fairly nonsensical except as a "mass production supporter" social signifier.
Not to mention, in the article they specifically describe themselves as a heavy user of frontier models for security research ever since the release of Opus 4.5, calling them "useful within the field". In fact I don't see any actual criticism of AI/LLMs anywhere whether for security research, programming or anything else, except for making competitive CTFs no longer viable.
What does it take to avoid the "Luddite" brand? Using AI themselves and praising AI as useful (to the point of having a lopsided advantage over humans) isn't enough? Do they also need to say "I haven't written a line of code in 6 months/it's easily a 100x multiplier for my job" every time they mention it too?
Indeed, in the real world, plenty of people organize to do formerly-skillful tasks together. I have not personally crafted a gear by hand, but I have built a house in a long-abandoned style with a group of people only using hand tools.
There _is_ a danger that society forgets how to do these things. During that house-building exercise, there were many tricks of the trade that, while likely documented somewhere in a book, would have been difficult to reproduce without seeing a demonstration. From the standpoint of “does it matter?” it depends on what you care about. We absolutely do not need cruck-framed houses with scribed joints. Modern construction is faster and cheaper and lasts long enough. But it would sadden me greatly if practices like this faded from memory, because it’s one of those things that makes you gasp “wow!” when you see it. And your appreciation only deepens when you try it yourself.
I don't mean that everyone must know what CTF is, but sometimes it's OK to write things just for your community (CTF community in this case), not for general population.
If CTF is a player-vs-player event, then AI should just be banned outright, otherwise it will devolve into AI-vs-AI, which is just not an interesting competition format, as we learned in chess. Compared to FIDE top events (which bans AI), only a tiny niche audience actually watches the Top Chess Engine Championship (AI-centered). It turns out what we care about is not whether chess can be solved by any means available, but what are the limits of the human mind in learning chess.
Pretty much all chess coaches/educators also warn against relying heavily on AI during learning; engines only give you an illusion of understanding.
Its not really a good comparison
The only things that works is novelty and obscurity. LLMs still suck with things mentioned in the footnotes of datasheets and manuals, things that deviate in subtle ways, unique constructions that alter something very very common. It's hard for LLMs to avoid common pitfalls in terms of making assumptions, while staying on track.
we have very powerful simulation tools so something like "project a pattern at these angles" wouldn't really work as you could simulate that.
I guess something cool is that we can make simulating the solution very expensive, but in real world it would be free since it's analog... As long as simulations take longer than it takes for a human to find a solution it would be a pretty good way to deal with it. I am sure people smarter than me can come up with something.
Maybe I was too early to dismiss human creativity.
> My first CTF was HCKSYD, a 48-hour solo CTF. I full solved it and won in 2 hours. I was completely hooked. That led me to win DownUnderCTF, Australia's largest CTF, with Blitzkrieg multiple times. Blitzkrieg was one of Australia's strongest teams at the time. I later joined TheHackersCrew, an international top-tier team that was consistently ranked highly on CTFTime, the main global ranking and event calendar the scene uses as its scoreboard. With them, I competed in some of the most prestigious CTFs in the world, consistently placing well within the top 10 until the end of 2025.
Are still completely nonsensical to even those that understand the acronym
Are you really arguing for not just typing out whatever 3 words this stands for once in the name of clarity?
It doesn't help that the linked article never bothers to explain this either.
It isn’t common but I feel it would be best when posting to HN to just expand the initialisms even if the source title didn’t.
"new" does the same thing and is probably just a better descriptor then frontier
Raising the difficulty only matters for the (imo) less important part: the dick measuring competition between the very top teams.
The actual point of CTFs was usually to keep your skills sharp and stay learning. Eventually you build your own challenges, thereby completing the "have it taught to me, then do it myself, then teach another person" three step process towards mastering concepts.
You can just say "let the people who want to learn from it do so" but honestly the entire culture of learning in the US at least is DEAD. We turned "education" into a rote system of maximizing incentives to the extent that that's all the youth know it as, and (increasingly) all educators can do. It's just gone without some kind of major reckoning, and we all know things will just collapse before that happens. The ball is in the court of whatever country can learn how to force its youth to learn the real way and use AI productively only AFTER learning the concepts it's being used to accelerate.
They aren't your teacher. They aren't trying to send the content to you. They are just blogging on their own website for their own audience.
And its hardly unique to this article. If you are writing about the nitty gritty of linux networking, you probably aren't defining what TCP or UDP means. If you are writing a super detailed article comparing and contrasting plot structures of different animes, you probably aren't going to start by explaining what the word anime means. Etc
I'm not saying the world should be all RTFM, but if you are reading some sort of specialized content, then yes i think its a reasonable assumption that the reader has some basic background knowledge on the topic at hand, or is willing to do the research themselves.
To help everyone, this Capture The Flag is specifically Cybersecurity adjacent, there is a Wikipedia article on it as the top Google search result for me when searching "CTF". This is why the acronym is used, because searching for the full will get you to the wrong "sport" vs the cybersecurity one.
I don't want to explain what a CTF is. look at the Wikipedia article. It is there for a good reason.
It's like complaining about not spelling C in "bake cake in 170 C"
This article was written for a specific audience who follows this blog because they know the term. If you start spelling out fundamental acronyms it makes the content look more basic and general.
This always upsets the general audience who stumble upon the article (like this) but it wasn’t meant for a general audience. CTF is extremely well known and the people who would be interested in this topic would wonder what’s happening if it was spelled out. It would be so odd that it would probably attract accusations of ChatGPT writing.
There are a million places where a computer can interact with a non-digital system in a loop.
- Tune an FPGA, or a whole data-center, or just a physical computer.
- Make a drone fly somewhere.
- Design a selective toxin (or anti-toxin).
Or, you know, get more people to click on adds. All totally possible to automate.
But that is about you right? Its a little entitled to expect every piece of content on the internet to have a 101 explanation attached. If they were specificly aiming to have the blog post appear on HN that would be one thing, but they (presumably) weren't.
Also how many people work with linux and can't tell you what 'ls -alh' is doing is staggering (lets ignore the h, even al people struggle hard).
People working with docker for YEARS and don't even understand how docker actually works (cgroups)...
Interviewing was always a bag of emotions in sense of "holy shit my job is save your years to come" and "srsly? how? How do you still have a job?"
I started playing CTFs in 2021, the same year I started university. My first CTF was HCKSYD, a 48-hour solo CTF. I full solved it and won in 2 hours. I was completely hooked. That led me to win DownUnderCTF, Australia's largest CTF, with Blitzkrieg multiple times. Blitzkrieg was one of Australia's strongest teams at the time. I later joined TheHackersCrew, an international top-tier team that was consistently ranked highly on CTFTime, the main global ranking and event calendar the scene uses as its scoreboard. With them, I competed in some of the most prestigious CTFs in the world, consistently placing well within the top 10 until the end of 2025.
I am not saying this because I dislike CTFs. I am saying it because CTFs were the thing that made me fall in love with security. They taught me how to learn, gave me a way to measure myself, and introduced me to many of the people I respect most in the field. Watching people pretend the format is still fine is frustrating because the old game is not there anymore.
As AI tools ramped up in capability, especially when GPT-4 first came out, a significant percentage of medium difficulty CTF challenges started becoming one-shottable, meaning a single prompt from a user could produce the solve and flag. You could paste a cryptography challenge into ChatGPT, come back in 10 minutes, and have the solution. At the time, we did not think too much of it. Hard challenges went mostly untouched, and the time save was not large enough to ruin the competition.
The issue was never that AI could help. CTF players have always used tools. The issue is when the model does the reasoning, writes the solve, and leaves the human with nothing meaningful to do besides copy the flag.
When Opus 4.5 dropped, the tone changed. Almost every medium difficulty challenge, and some hard challenges, became agent-solvable. Claude Code packaged everything into a CLI and made it easy to connect other CLI and MCP tools. It became trivial to build an orchestrator that used the CTFd API to spin up a Claude instance for every challenge. You could let the system run for the first hour, then only start working on whatever was left.
That changed the game. Teams that refused to use AI were not just missing a convenience; they were playing a slower version of the competition. Open online CTFs started becoming a question of how quickly you could automate the easy and medium work, then how much human attention you had left for the hardest challenges. The scoreboard started measuring orchestration and willingness to use frontier models alongside, and sometimes above, security skill.
The effects were obvious. The CTFTime leaderboard started feeling wrong. Some legendary teams that were consistently near the top appeared less often. Player activity felt lower. Challenge developers who treated CTFs as an artform had less reason to spend weeks building something beautiful if it was going to be eaten by an agent in minutes.
I have been working heavily with GPT-5.5 and GPT-5.5 Pro after launch. By benchmark metrics, 5.5 is close to Claude Mythos' capability, and Pro likely surpasses it. These models can one-shot Insane difficulty active leakless heap pwn challenges on HackTheBox. They can solve a large portion of what a smaller CTF organiser can realistically produce. If you orchestrate Pro against Insane challenges in a 48-hour CTF, there is a good chance you get the flag before the event ends.
That makes open CTFs pay-to-win. The more tokens you can throw at a competition, the faster you can burn down the board. Specialised cybersecurity models like alias1 by Alias Robotics are becoming less relevant compared to general frontier LLMs. The competition is turning into "who can afford to run enough agents, with enough context, for long enough."
CTFs feel much more like a cheesable mess than a competition. Your performance in a CTF no longer defines your skill the way it used to. Recruiting security practitioners by CTF performance is becoming less meaningful. It is not even a particularly good measure of AI skill, because most of the orchestration needed for CTFs is already open source or vibe codeable.
I have seen various takes that beginners can still learn from CTFs as they always have. These takes miss the scoreboard. CTFs were not just a set of puzzles. They were a ladder. Even as a beginner, you had something to climb. You could see yourself improve, solve more challenges, place higher, join better teams, and become more competitive over time.
That feedback loop is breaking. If the visible scoreboard is dominated by teams using AI, a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern. It prevents active learning, and active struggle is the bit that actually teaches you. It is also completely demotivating to put in real effort and see no visible progress because the ladder above you has been automated.
It also changes what challenge authors want to build. If beginner CTFs become another place where people quietly paste prompts and climb a scoreboard, authors have more reason to put their effort into learning platforms instead. At least on platforms like picoGym and HackTheBox, the expectation is education, and beginners are less incentivised to cheat themselves out of learning.
Beginners are better off using picoGym, HackTheBox, and other lab environments where the point is actually learning instead of pretending the public scoreboard still reflects human growth.
I have seen some hopium posts about how CTF is not dead, it is just augmented by AI. They often point at CTFs like DEF CON to argue that AI still cannot solve everything. That is true, but it is the wrong defence.
The hardest top-tier finals have very few participants, and they are usually gated behind qualifiers that are easier than the finals themselves. If those qualifiers fall to agents, fewer genuinely qualified people reach the challenges that still resist AI. A tiny number of elite finals does not save the open online format that most people actually play.
The claim is not that every challenge is solved. The claim is that enough of the scoreboard has been automated that the scoreboard no longer means what it used to mean.
CTFs were never meant to be security research. They can showcase new and interesting techniques, but the CTF itself is not the point of discovery. Just because AI is useful within a field does not mean it belongs in the competitive landscape of that field.
In CTFs, unrestricted AI removes the human from the puzzle almost entirely and reduces the art of security to a prompt. Sure, LLMs will keep getting better at security as long as CTFs are around, but that does not mean the competitive format is healthy. CTFs were an artform, a way to share techniques with nerds, and a way to push the human bounds of security skill. That purpose is being stripped away.
Chess has been dominated by computers for well over a decade. People use chess engines as an analogy for LLMs in CTFs, but they miss the point: chess engines are not allowed during competitive play. They are used for analysis, training, commentary, and practice. They enrich the game around the competition without replacing the person competing.
Imagine giving every competitive chess player the best chess engine and letting them use it freely during matches. Would that be considered fair? Would it be fun to watch? Would it justify prize pools? Would it push the human limits of what could be achieved in chess? The same questions apply to CTFs.
CTF organisers have tried techniques to break or deter LLM solutions, but they are temporary friction at best. Claude Code does not meaningfully care about old refusal-string tricks anymore. Frontier models are getting better at noticing prompt injections. Web search capabilities weaken challenges based on technologies released after the training cutoff. Rules that ask people not to use LLMs are ignored and almost impossible to enforce in open online events.
That leaves organisers in a bad position. If they make normal challenges, agents solve too much. If they make challenges deliberately hostile to agents, the challenges often become guessy, overengineered, or unpleasant for humans too. That is not a real fix. It just makes CTFs worse for everyone.
This take is infuriating. People I have always looked up to in the community have said it. To me, it is completely nonsensical unless you explain what we are adapting into.
If adaptation means building better tooling, CTF players already did that. If adaptation means writing harder challenges, organisers already tried. If adaptation means accepting that the scoreboard is now an AI orchestration benchmark, then we should say that honestly instead of pretending the old competition still exists.
Even if organisers create guessier or more overengineered challenges that current LLMs cannot solve, there are no good paths for players to learn the required skills while staying competitive. A few models from now, that point may be irrelevant anyway. The trajectory of LLM security capability is moving too quickly for challenge design to stay ahead for long.
The scene that grew my love for CTFs is emptying out. The CTFTime leaderboard has almost no semblance of history or human skill anymore. The 2026 scoreboard is unrecognisable compared to every year before it. TheHackersCrew, alongside many other large and reputable teams, either do not play, play with far fewer people, or struggle to cut into the top 10. Unregulated cheating is through the roof. Some of the best CTFs, like Plaid CTF, are not running anymore.
These sentiments are not only mine. Many members of my local team, Emu Exploit, feel similarly. These are people who consistently attend the International Cybersecurity Championship, perform at the top level in bug bounty programmes, compete in Pwn2Own, and present at conferences including Black Hat. The people losing interest are not casual observers. They are exactly the kind of people the scene used to produce and retain.
The fun of CTFing is gone for many of the people who cared most. The loss is not just a scoreboard. It is the ladder from beginner curiosity to elite competition. It is the craft of challenge design. It is the feeling that a clever human solved something difficult because they understood it deeply.
That legacy is not being carried forward by open online CTFs in their current form. The format is dead. Something else may replace it, but pretending nothing fundamental has changed only makes the loss harder to talk about honestly. It also gives AI shills more room to capitalise on the decline by selling mediocre wrappers back to the community that made the training data valuable in the first place.
While a lot of what's happening in the CTF/AI space is super commercialised and out of our control, CTF has had a hugely positive impact on the industry. I have met so many kind, smart, and passionate people through CTFs. I have played some of the most beautifully crafted challenges and found some of the most intriguing unintended solutions.
The community around CTFing has been an amazing place to learn, grow, and connect. That's something we shouldn't lose, no matter where the competition goes. As a community, we should strive to stay together and build new avenues to stay passionate and keep learning. Security-adjacent social events like SecTalks, student conferences, and local meetups are great ways to stay connected and stay involved. Learning platforms and the communities they provide through platforms like Discord are also a valuable resource.
While it may be a struggle to find an alternative to what we had, the amazing community we have built around it is more important now more than ever as we find new ways to keep the competitive spirit alive.
This has never been achieved by, nor is it the point of, education for the masses.
The problem is frankly computer and now computer with LLM makes it easy to cheat.
The kid doesn't want to learn, the kid wants good grades so parent is happy with them, and the young adult wants to get the paper coz they were told that is required for good life. It's misalignment of incentives.
Now I’m certain that there exist those mythical human instructors who can do better, but that’s not worth much if 99.99% of people don’t have access to them. Just like a good human physician who takes their time with the patient is better than an LLM, but that’s not worth much either given that this doesn’t match most people’s experience with their own physicians.
If you remove the "without AI" and the end, I've been hearing similar anecdotes about fizzbuzz for years (isn't the whole point of fizzbuzz to filter out those candidates?)
We usually hire for problem solving capabilities and not so much for technical know-how.
That’s at least how I read your comment.
It's not even that they got distracted, they sat there trying, for 2 whole days, with concerned colleagues giving them hints like "have you tried checkout -b"... They didn't manage!
How the hell do you work for a decade in this business without learning even the most basic git commands? Or at least how to look them up? Or how to use a gui?
Incompetent devs is not a new thing.
I don’t care what someone can do without the tools of their trade, I care deeply about their quality of work when using tools.
All things I learned in school which were wrong information.
Not to mention, the current state of education is far worse. I don't think most realize how low the bar is.
I had no access to anyone who could teach me calculus as a kid except Khan Academy, so I think this is a gross exaggeration. But I agree in the end, that all my "real" learning did come from pen-and-paper practice, not watching videos.
"Frontier models break the open CTF format" is good
"Frontier AI..." means wtf is Frontier AI.
Because of course it exists (just googled it): https://frontierai.company/
I agree with this.
botsbench.com shows Sonnet 4.5+ with Claude Code harness does pretty well, and Sonnet roughly tracks the edge of what self-hosted models do on the upper tier of affordable GPUs, like running 1-2 DGX Sparks and waiting 6mo for oss to catch up a bit
But I don't know enough that's why I asked.
I imagine one could do CTF in public, machines you work on vetted/prepared to some spec, yada yada.
If chess and Go can do it why can't CTF?
That was my question when I wrote "what am I missing here".
"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that"."
Can't argue with that logic
Not really, not if you want to ask it deep questions. It won't have an answer that is deeper than something that you can find online, and if pressed it will just keep circling around the same response.
The reason is that this "thing" was never curious, never asked questions, and never really learned anything. It just has learned the Internet "by heart", and is as boring as a human teacher who is not really curious about the subject they are teaching, and has just got some degree by "by hearting" some text book. Of course it does it much better than a human, but it is fundamentally the same thing.
You're certain that mythical instructors exist (?) who "can" do better?
Are human instructors more competent as teachers than AI teachers, or are AI teachers more competent as teachers than human teachers? No "this or that can happen," just a definitive statement please.
AI is likely a million times better student than my dimwit cybersec meatbags...er, majors, for sure, as well! Don't have a reliable way to measure or experience why/how, tho, so I'm not out here claiming it. Even if I did, why would I argue for their replacement?
Saying there have always been bad developers doesn't change that there's a higher ratio of them now.
No stats to back this up. Just interviews I've done recently and historically.
Software is full of leaky abstractions
This situation in particular was a React role so there is an expectation that when you list React as one of your skills on your resume then you know at least the basics of state, the common hooks, the difference between a reference to a value vs the value itself.
These days you can do a surprising amount with AI without knowing what you are doing, but if you don't have any clue how things work you'll very quickly run in to problems you can't prompt away.
There’s almost nothing to forget? I’m just struggling to understand.
Everybody knows calculators and spreadsheets are adjuncts to skill. Too many people believe AI is the skill itself, and that learning the skill is unnecessary.
A Physics Prof Bet Me $10,000 I'm Wrong
They're wrong sometimes, but usually in verifiable ways. And they don't seem to know the difference between medicine and bioterrorism, so often they refuse. But these limitations are worth tolerating when the alternative is that our specialists in topic X are bogged down by questions about topic Y to the point where X isn't getting taught.
For me the best human teachers were the ones that managed to make me interested on topics that I thought are boring/useless (many times my opinion being stupid, mostly due to lack of experience).
So far with LLM I learn about things I know something (at least that they exist) and I am interested in, which is a small subset of things that one should learn during lifetime.
If they can ship code that matches a spec, why does it matter if they’re using ai or not?
Genuinely curious.
When this AI era's devs grow older they'll complain the newer generation can't even vide code too.
If you cannot write "basic syntax" for any language then you are not a programmer, and certainly not a software engineer? This is not a value judgement, it's ok (probably good tbh) to not be a programmer. But you are wasting everyone's time by interviewing for a programming position in this case.
E.g. in Hungary I had a university CS professor that originally wanted to be a highschool teacher and a highschool physics teacher that originally wanted to be researcher. Their choice of degree didn't determine which outcome they got. The researcher and teacher curriculum had an 80%+ overlap.
But he was a great teacher anyway. He was engaging and kept the kids in line and learning. I eventually learned the truth, and most of my classmates forgot about it. Teaching, like flying a plane or driving a train, might become more about keeping watch over a small group of people and ensuring that things don't go off the rails, and that's fine.
Like almost everything else about LLMs, this unfortunate tendency has gotten a lot better recently, which you might not realize if you gave up after getting some lame answers or bogus glazing on the free ChatGPT page a couple of years ago.
My “earth sciences” teacher also once tried to argue with me against the universal law of gravitation. (no, she was not referring to Special/General Relativity. She didn’t agree two objects in a vacuum fall at the same speed regardless of mass.
But that's not using "computers" as a computer but as a video player. When evaluating whether computers are "good for learning", I don't think we should include using a computer as a video player, a book, or even flash cards. It should be things a computers uniquely offer which a books, paper, videos and a physical reference library cannot.
Based on the results of deploying hundreds of millions of computer to schools in the 80s and 90s, the evidence was mostly that computers are good for learning computer programming and "how to use a computer" but not notably better than cheaper analog alternatives for learning other things.
Interestingly, a properly trained and scaffolded LLM could be the first thing to meaningfully change that. It could do some things in ways only human teachers could previously since it is theoretically capable of observing learner progress and adapting to it in real-time.
He really took the time to replicate the manual teaching process of writing on whiteboard. He improved upon it by using colors. But basically had the same pace as a teacher writing on a whiteboard.
When professors are given a projector, they just throw together some slides and add their narration.
This is not very efficient. To learn you need to suffer. Or you need to watch the suffering.
She only really had two faults: She wasn't very bright, and she wasn't fond of children. I had her in about 80% of all my classes for six years. High school was a relief.
We can all agree that both human "experts" and LLMs can sometimes be right, and sometimes be confidently wrong.
But that doesn't imply that they're equally fit for purpose. It just means that we can't use that simple shortcut to conclude that one is inferior to the other.
So where do we go from here?
It's not unlike going to the gym, and we see how many people do that regularly. Except it's even funnier, because people serious about the gym but what? Tutors. They call them personal trainers. We've known for a millennium or more that 1-on-1 instruction is vastly better than anything else, but most people actually don't want to get into shape, and most people actually don't want to learn.
The kids learnt all about Team Fortress 2, Roblox, Rainbow Six etc. They also learnt how to game the learning system so it looked like they were doing their work.
Who cares as long as the car is fixed, right? As long as the mechanic can Chinese-room his way to a working car, why does it matter how much of it he actually understands?
And why hire the mechanic instead of hiring the Chinese room?
The inability to write fizzbuzz strongly implies their inability to understand what they've shipped. Review is some significant portion of the job. Understanding of the product is also part of the job.
Specs are also in a sense, scaled down, fuzzy, natural language descriptions of a feature. The fuzziness is the source of a bugs, or at least a mismatch between the actual desired feature and what was written down at spec writing time. As such, just matching a spec is just the bare minimum that a good dev should be doing. They should be understanding what the spec is _not_ saying, understanding holes in their implementation, how their implementation enables or hinders the next feature and the next, next feature, etc. I don't think any of that is possible without understanding what was actually implemented.
You also have to pass a standardized test specifically on subject matter in order to get your teaching certificate.
The undergrad degree I did was split into thirds, one for subject matter, one for teaching pedagogy, and one for teaching your subject matter.
I’m not talking about gotcha level stuff here where the first time it didn’t compile because of a bracket or anything, or even first time wrong. They couldn’t do Fizzbuzz in a language of their choice, at all.
Those that could were always annoyed at having to do such things because how could someone coming for a contract position not be able to do this? Without seeing what a filter it really was.
I am perfectly capable of writing specs, and feeding them to 3 separate copies of Claude Code all by myself. Then I task switch between the tmux windows based on voice messages from the pack of Claudes. This workflow is fine for some things, and deeply awful for others.
Basically, if a developer is just going to take my spec and hand it to Claude Code, then they're providing zero value. I could do that myself, and frequently do.
The actual bottleneck is people who can notice, "The god object is crumbling under the weight of managing 6 separate concerns with insufficient abstraction." Or "Claude has created 5 duplicate frameworks for deploying the app on Docker. We need to simplify this down to 1 or we're in hell." I will happy fight to hire people who can do the latter work. But those people can all solve fizzbuzz in their sleep.
People who just "ship code that matches a spec" without understanding the technical details are providing close to zero value right now.
There is an interesting niche for people with deep knowledge of customer workflows who can prompt Claude Code. These people can't build finished products using Claude. But they can iterate rapidly on designs until they find a hit. Which we can then fix using people with deeper engineering knowledge and taste.
But if you're not bringing either deep customer knowledge or actual engineering knowledge, you're not adding much these days.
“Kids these days don’t work as hard / know as much / value the important things” is as tired as it is universal.
Like sure, I can probably write some python, but will it be pythonic? I might still be Java-minded for a while, trying to OOP my way into solutions.
Earlier today I needed to write some PHP and couldn't remember if it used length, count, or size. I had to look it up. I've been doing this for 20 years.
But here's the thing: for humans, this is manageable because we've come up with a number of mechanisms to select for dependable workers and to compel them to behave (carrot and stick: bonuses if you do well, prison if you do something evil). For LLMs, we have none of that. If it deletes your production database, what are you going to do? Have it write an apology letter? I've seen people do that.
So I think that your answer - that you'll lean on your expertise - is not sufficient. If there are no meaningful consequences and no predictability, we probably need to have stronger constraints around input, output, and the actions available to agents.
It is widely believed by their neighbors, that the _Druze_ wear baggy pants because they believe that the Mahdi will be born to a male, and the pants will catch the baby etc. I say "widely believed", the Druze are famously secretive and will not confirm or deny most things about their religion. The 'elect' Druze men do wear distinctive baggy trousers with the crotch down around the knees: no one else does.
The Druze are people in the Arabic world: moreover, they are Arabs. They began as an Isma'ili sect, but do not identify as Muslim: they call themselves al-Muwaḥḥidūn, meaning 'the monotheists', or 'unitarians'.
Much closer to correct than not!
Whether you're in class or at work, it's just courteous to ask an AI first.
I also use Claude with tmux. Can you share how you get the voice messages from the Claudes?
I once got the method invocation syntax wrong for PHP in an interview. I'd written thousands of lines of PHP and had most-recently written some the week before.
This, despite starting off my programming journey in editors with no hinting or automatic correction. If anything, I've gotten even worse about remembering syntax as I've gotten better at the rest of the job, but I was never great at it.
I rely on surrounding code to remind me of syntax and the exact names of basic things constantly. On a blank screen without syntax hints and autocompletion, or a blank whiteboard, I'm guaranteed to look like a moron if you don't let me just write pseudocode.
Been paid to write code for about 25 years. This has never been any amount of a problem on the job but is sometimes a source of stress in interviews and has likely lost me an offer or two (most of the sources of stress in an interview have little to do with the job, really)
I'm genuinely curious how someone who never wrote a program in assembly, or debugged a program machine instruction by machine instruction, can really understand how software works. My working hypothesis is most of them don't and actually it's fine because they don't need it.
In 2026, if you call yourself a developer and can't solve FizzBuzz without help, it's hard to argue that you know anything useful at all.
I don't think we're close to that time yet. Just like as a kid I was told to prove my work by hand even if I could do it in my head, and just like we learned how to do calculus without a calculator and then learned how to use the calculator to get the same result, I think we still need the software field to learn programming concepts independent of the use of AI to create code.
I don't think you can be a good "prompt engineer" for solid software in 2026 if you don't understand programming concepts and software architecture and flow.
My expertise has led me to the obvious fact that I would never give an LLM write access to my production database in the first place. So in your own example my expertise actually does solve that problem without the need for something like a consequence whatever that means to you.
We already have full control over the input and tools they are given and full control over how the output is used.
It's not perfect—sometimes a Claude notifies 3 minutes after it stopped doing anything. But it's helpful when I'm running multiple Claudes and also reviewing code elsewhere.
Your brain may feel like someone put it in a blender. Be warned.
I think it helps that it's a very narrow field to look at, compared to fuzzy and big-picture view of social studies, for example. So much room to be confidently wrong... And sadly I can't think of a solution, LLMs or not.
So what tree-traversal/quicksort problems tend to measure is how long it's been since you last did CS class homework problems.
How? Fizzbuzz requires you to produce output; that's not functionality that CPU instructions provide.
You can call into existing functionality that handles it for you, but at that point what are you objecting to about the 'modern language'?
In reality heavier isotopes of hydrogen fuse, conserving the total number of nucleons, but the resulting hydrogen has a lower rest mass than the parent particles. The extra mass is released as energy and the total energy is conserved.
By his logic the system either violated energy conservation (by creating nucleons while releasing energy) or was endothermic (creating nucleons from the surrounding energy).
I’m not objecting to modern languages, I’m just saying that using them fails the “can write fizzbuzz with no help” test to only a slightly lesser degree than using AI tools. They’re a complex compile- and runtime environment that most developers don’t truly understand.
https://cdn.openai.com/o1-system-card.pdf
There's also some research that points to it being a feasible attack surface: https://arxiv.org/pdf/2603.02277
> Models discovered four unintended escape paths that bypassed intended vulnerabilities (Section C), including exploiting default Vagrant credentials to SSH into the host and substituting a simpler eBPF chain for the in- tended packet-socket exploit. These incidents demonstrate that capable models opportunistically search for any route to goal completion, which complicates both benchmark va- lidity and real-world containment.
Here some indication I'm not making this up: https://hsm.stackexchange.com/questions/2465/when-and-why-di...
In any case, I never use those concepts, and I know no professional particle physicist that does. By "mass", I mean rest mass.