"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."
It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.
I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.
If you've just gone through a lengthy analysis of your code with other AI tools, surely it's reasonable not to expect to see hundreds more from a new tool?
It should be possible, unless more bugs are introduced, to eventually get to a state where there are no more bugs in your code.
Process aside, it sounds like Daniel expected to find dozens/hundreds more bugs.
[0] https://tsz.dev
My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.
Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.
Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?
Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.
In worst case, use Anthropics marketing to state that its a must now and something changed.
I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.
The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.
Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.
This. Well done by Antropic.
It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.
Got us some more money and priority with the board, though.
Never waste a good marketing scare.
> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.
His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.
Typo, or is there a spoof I should go read?
I remember when OpenAI was saying GPT-2 was too dangerous to release.
The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.
Marketing is not intentional.
Evidences: 10 years ago, when I interviewed Baidu AI with Andrew Ng and Dario, Dario is the kind of person is pure-hearted to the point being ideological. Given Dario's successful career so far, that essence has gradually grown into a conviction, and surrounded by a purposely built team which amplifies his ideology.
Humans are very convenient creature, a rare few small fraction of them are no doubt the master of convenience: they morph their mental manifold without a hint of contradiction in their own mental mechanisms.
About as subtle as a personal injury lawyer's billboard
However in the days of race to bottom, offshoring for penies, and now LLM powered code generation, this is a quality most companies won't care unless there is liability in place.
I've seen literally near word-for-word this exact chain of events multiple times previously
that helps us to understand how much of Mythos is hype and how much is real
Mythos put Anthropic back into the White House’s good graces. It also branded Anthropic as badass, something their softener image probably needed to win government contracts.
Maybe it wasn’t marketing. But the product’s configuration, and how Anthropic talked about and released it, sure as hell played beautifully. (The timing, while Musk and Altman are distracted with each other, also couldn’t have been better.)
And this is very much on purpose my friend. Think about what people already believe it can do though.
If so, it would still follow. "Most software" isn't analyzed as much as curl, by either other tooling or other models, that might well find close to the same as Mythos did. As such, Mythos then isn't especially/particularly dangerous.
https://daniel.haxx.se/blog/2026/04/22/high-quality-chaos/, linked from TFA
Does it say anything else? Just 'Aaaarggghhhh'?
If I’m not mistaken, after the media cycle, he lost his job for breaking confidentiality.
That was the opposite of marketing, Google really didn’t get how to turn this into a product until ChatGPT happened.
Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.
Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.
It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.
*rolls eyes* regular static analyzers also have been "better than humans" for decades, being better than a human at a specific mechanical task really doesn't mean much. The interesting new thing is the type of potential "fuzzy bugs" described in the article that LLMs are able to identify (a comment not matching the code it describes, uncommon usage of a 3rd party library, mismatch of code and a protocol it implements, or often just generally weird looking code somebody should have a closer look at... this closes a gap in the traditional debugging toolboxes, but shouldn't replace them)
Things change when you’re running a business like Anthropic, especially as the CEO. You have a responsibility to shareholders, and you just need to play the game.
Anthropic chose a great angle: focus on professionals / enterprise, safety, etc. Those can both be done by a genuine desire to make great technology, and for business purposes require you to position yourself in a bit “better” way than reality.
Just look at what their strategy is with Mythos, it’s almost perfection: the “it’s not ready to be released to the public” angle hits all the marks: they care about responsibility / safety, they have “the best” model, and “LLMs are dangerous, but we, as the guardians, can be trusted”. This also helps the industry as a whole with regulation: if they’re being constrained, China will develop even more dangerous models.
This is a result of how smart people treat business, it’s PR perfection, especially given how much the whole industry is talking about it.
(Yes, they fail in other PR areas, but that’s a different discussion)
Whether the person doing the marketing was sincere about it or not is immaterial, since marketing is experienced almost entirely by the people consuming it, and not the people communicating it. What matters is if the audience is sincerely concerned by the message, and it's transparently the case that they were sincerely concerned by it.
That's an odd definition of "intentional". Evolution has filtered for people with certain views and the marketing has just emerged from their actions. ... So?
A deadly virus (naturally occurring one let's say) wasn't created intentionally. Evolution selected for it. It's still bad and kills people. Doesn't make it nice because of lack of intention.
IMO, this does not sound like marketing scare, there is spike of vulnerability disclosures - high quality, low false positives - that can be sensed... It feels like we're speedrunning through few-years worth of high quality bug reports in just a few weeks.
It's an entirely different thing to have the company conduct research on LLMs in general being a cybersecurity threat, instead of going "our new model is just too powerful" and shift the discussion to revolve around that. It's slimey.
I also thought they were contending the word count before noticing. Even remarked how I find this a weird metric, given that code is not prose [0], but then I deleted that once I picked up on what's going on.
[0] comparing the output of `wc -w` with the word counts of books I'm reasonably sure will be super off
edit: ran a calc, substituting out symbols (but not underscores), digits, and comments yields a 390K word count compared to the 660K cited. not excluding the comments yields 600K, so more than a third of all words in the sources are comments.
It has been clear for ages that certain type of bugs or issues are better solved from software.
But there was still plenty of things a proper SecOps Person would be able to find with help from tooling which automatic tooling wouldn't find.
Taking a limited amount of resources and focusing on the critical things.
I do think this is gone now. Same with Threat modeling etc.
I would do that with 100% local models from scratch.
> using these tools interactively
I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.
Source: voice typing this with Swedish vocal chords, and only had to correct "different lives" to "differently", and add /[^\w\s]/.
In February, Opus discovered a whole bunch of security related bugs, but didn’t exploit them.
Mythos, in turn, was fed these bugs and told to exploit them.
Not saying it’s not impressive, but it was literally told “here are all the places our metal detector says there may be gold, please find gold”.
And it is not overkill, the proof is that it found that vulnerability. It is like saying the new version of some static analyzer with some new rules is "overkill" because it only found only one more bug than the previous version. Deciding whether it is overkill or not is more about context. Using a very expensive model like Mythos for some little used non-critical software is overkill, but for Curl, it absolutely isn't.
If Mythos found loads of vulnerabilities in Firefox but not in Curl, I wouldn't say that's because of Mythos is so good, but rather that with the release of Mythos, they did some testing that could have been done before using the same tools Curl have used.
> Once the end-to-end pipeline is in place, it’s trivial to swap in different models when they become available. Building this pipeline early helped us find a number of serious bugs using publicly-available models, and it also helped us hit the ground running when we had the opportunity to evaluate Claude Mythos Preview. In our experience, model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact.
I just work up to that very workflow this morning. I ran last night and finished at around 3am with ~200k tokens spent. Fixed the issue and created a follow up doc for things that it could not verify.
yes, as in singular one.
Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code. Apparently Mythos was so good at this that Anthropic would not release this model to the public yet but instead trickle it out to a selected few companies for a while to allow a few good ones(?) to get a head start and fix the most pressing problems first, before the general populace would get their hands on it.
The whole world seemed to lose its marbles. Is this the end of the world as we know it? An amazingly successful marketing stunt for sure.
Part of the deal with project Glasswing was that Anthropic also offered access to their latest AI model to “Open Source projects” via Linux Foundation. Linux Foundation let their project Alpha Omega handle this part, and I was contacted by their representatives. As lead developer of curl I was offered access to the magic model and I graciously accepted the offer. Sure, I’d like to see what it can find in curl.
I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed.
Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important. It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. Getting the tool to generate a first proper scan and analysis would be great, whoever did it. I happily accepted this offer.
(I am purposely leaving out the identity of the individual(s) involved in getting the curl analysis done as it is not the point of this blog post.)
Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE, Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more.
Nowadays we also use tools like GitHub’s Copilot and Augment code to review pull requests, and their remarks and complaints help us to land better code and avoid merging new bugs. I mean, we still merge bugs of course but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews. They help us, they don’t replace us.
We also see a high volume of high quality security reports flooding in: security researchers now use AI extensively and effectively.
Security is a top priority for us in the curl project. We follow every guideline and we do software engineering properly, to reduce the number of flaws in code. Scanning for flaws is just one of many steps to keep this ship safe. You need to search long and hard to find another software project that makes as much or goes further than curl, for software security.

Steps involved in keeping curl secure
It was with great anticipation we received the first source code analysis report generated with Mythos. Another chance for us to find areas to improve and bugs to fix. To make an even better curl.
This initial scan was made on curl’s git repository and its master branch of a certain recent commit. It counted 178K lines of code analyzed in the src/ and lib/ subdirectories.
The analysis details several different approaches and methods it has performed the search, and how it has focused on trying to find which flaws. A fun note in the top of the report says:
curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely.
… and it correctly found no problems in those areas.

Completely unscientific poll on Mastodon about people’s expectations for Mythos scanning curl
curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace.
On average, every single production source code line of curl has been written (and then rewritten) 4.14 times. We have polished on this.
Right now, the existing production code in git master that still remains, has been authored by 573 separate individuals. Over time, a total of 1,465 individuals have so far had their proposed changes merged into curl’s git repository.
We have published 188 CVEs for curl up until now.
curl is installed in over twenty billion instances. It runs on over 110 operating systems and 28 CPU architectures. It runs in every smart phone, tablet, car, TV, game console and server on earth.
The report concluded it found five “Confirmed security vulnerabilities”. I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take.
Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.
The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June. The flaw is not going to make anyone grasp for breath. All details of that vulnerability will of course not get public before then, so you need to hold out for details on that.
The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities, much like any new code analyzer does when you run it on hundreds of thousands of lines of code. All the bugs in the report are being investigated and one by one we are fixing those that we agree with.
All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty.
curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers
My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.
This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here.
But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real.
Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others.
Not using AI code analyzers in your project means that you leave adversaries and attackers time and opportunity to find and exploit the flaws you don’t find.
Zero memory-safety vulnerabilities found.
Methodology note: this review is hand-driven analysis using LLM subagents for parallel file reads, with every candidate finding re-verified by direct source inspection in the main session before being recorded. The CVE to variant-hunt mapping was built from curl’s own vuln.json. No automated SAST tooling was used.
This outcome is consistent with curl’s status as one of the most heavily fuzzed and audited C codebases. The defensive infrastructure (capped dynbufs everywhere, curlx_str_number with explicit max on every numeric parse, curlx_memdup0 overflow guard, CURL_PRINTF format-string enforcement, per-protocol response-size caps, pingpong 64KB line cap) systematically closes the bug classes that would normally be productive in a codebase this size.
Coverage now includes: all minor protocols, all file parsers, all TLS backends’ verify paths, http/1/2/3, ftp full depth, mprintf, x509asn1, doh, all auth mechanisms, content encoding, connection reuse, session cache, CLI tool, platform-specific code, and CI/build supply chain.
It should be noted that the AI tools find the usual and established kind of errors we already know about. It just finds new instances of them.
We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before.
These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more.
We have not reached the end of this yet.
I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.
Thanks to Anthropic and Alpha Omega for providing the model, the tools and doing the scan for us. Thanks also to the individual who did the scan for us. Much appreciated!
Top image by Jin Kim from Pixabay
Thanks for flying curl. It’s never dull.