- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented) - Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main - I have a less developed nginx version that would be the north star - These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
The only trend Mythos continues is Anthropic’s trend of warning that disaster is always 6 to 12 months away.
It is in all respects foreign code in a language I may or may not be familiar with, and worse yet, if I were to take over, I'd be responsible for maintaining the whole black box forever more?
Thank you but no thanks.
Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
All that to say I think these automated ports are interesting experiments. However if you want to build something people can trust, the people need to be able to trust that you fully understand what is built, and why it's built the way it is.
Step2: offer to test it, but only for the biggest companies in the world
Step 3: onboard those big players on your tooling and product
Step 4: profit
This is genius.
- They still claim 10000 issues, but they found only one in curl.
- They did not find rsync issues but Claude rather introduced rsync issues.
- Facebook is a member of this cult program but Mythos did not find the account takeover flaw.
- Mythos did not find the issues in Anthropic's own Bun rewrite.
They will not release Mythos because it would be exposed as a fraud before the IPO.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
If society can't trust banks and other institutions to safely control their data, what follows ?
Do we we collectivelly switch off the internet?
I mean most nasdaq tech companies would be in 13+ countries, why are they writing this like it's a big number, is hilariously small?
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
No one wants Bun in Rust, no one wants the rsync vibe code additions. This is just the only pro-AI comment, so the AI people voted it to the top.
There might be a world where people soon just find unsafe C code exposed to the web (i.e. nginx) an untenable situation and I hope it can be a helpful resource.
Anyway, I see open source code as positive sum. Maybe in the end only a small community who cares about cross compilation finds this helpful and thats a win!
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
I also don’t care if it’s written by humans or LLMs or robot overlords from Alpha Centauri. Again, if it works, it works.
The operative word here is ‘works’. Code is now cheap, QA still isn’t. Since people don’t really like doing the same thing twice, specs for working code have never been written. Nowadays there is no reason to not create a spec detailed enough for robots to make no mistakes (pun intended) when filling in the gaps when converting from spec space to code space. As long as this remains true, I don’t care who or what does the boring parts.
And I'd disagree on no one wants - Lua is quite helpful since it is easily used in WASM. There has been some interest from people in the Bevy community - a game engine in Rust - since you can't have Lua scripting in browser games easily with the C version.
But anyway if people want it or not memory safety might become much more important so I think it is a good area to explore. Some people think large C codebases are inherently unsecurable https://alexgaynor.net/2020/may/27/science-on-memory-unsafet...
People and organizations can have mixed motivations. It’s often not “just” one thing.
Err... wait... that was already the hard part... hmm
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
Chinese labs will force their hands, until then let’s hope maximum number of projects get patched at a reasonable pace.
But the idea that we'll squash all of the critical vulns is simply nonsense, despite the weird Firefox blog posts that indicate otherwise.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
So they have a whole lot more compute now than they did last month.
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
It means than even if the value you offer is similar as your competitors, you are the one conquering the market.
That's the only way to not becoming a commodity.
Don't you understand, if they really did do the <ai magic> they don't need to hire anyone, IT SELLS ITSELF
It's super interesting to hear this refrain on HN, it is alarmingly common. Anthropic released benchmark numbers on Mythos, as they have for all of their models. Once models become public, people evaluate them in a myriad of ways. We have had reliable scaling laws for years and they still hold. Epoch capability index continues to grow exactly as expected. Where does this idea come from?
As for cost, the cost per token at a given level of performance drops up to 40x per year.
At this phase no company would risk their brand by calling the product as ineffective. The big players are in it together and small ones have no option but to play along.
Nevertheless collecting the historical wisdom and running it at machine scale does have a lot of benefits for sure. The only question is the signal to noise ratio, machine is doing what humans did, just at a multiplier speed and with a lot more context than what a normal human can hold.
They seem pretty close, in both average and "best run" scores. And, in a highly verifiable domain, "best run" or pass@n is what you're looking for.
I think that most people at Anthropic are true believers from my interactions with them so I don’t believe this theory anecdotally. The simplest explanation is that it really is taking a while to gain confidence they won’t be used for a spree of bad cyber attacks. Knowing how long it takes institutions to fix security issues when filed by humans I would be more suprised if this wasn’t the case.
But I would forgive anyone who did think it was deliberately sandbagged; given the staggering sums at play, true believers might believe the ends justify the means to a little “marketing” like this.
They want the plebs, they want the mass.
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
Marketing move doesn't mean scam. It describe the ability to sell people over a narrative and surpassing your competitor in market share. And that's exactly what is happening.
My post is a "tribute" to the efficiency of Anthropic's communication. I never complained about anything, nor calling it a scam, nor saying they should have released mythos to the public instead of rolling it out to a selected cohort.
You tried to expand my words to make me say something I didn't, because my post wasn't giving you a clear conclusion of my opinion regarding their private release.
As an ordinary developer who relies on a $20–$200/month subscription, I feel disappointed by the release of a paper describing a model that I can’t actually use.
They did produce great value, claude code and opus 4.5 are a singularity in software engineering.
The job we practiced for decades simply doesn't exist anymore.
For all they know they'll find a new optimization that lets them serve Opus class models for half the computing cost next month. Or someone will invent the next OpenClaw and demand will 10x over night.
To a lot of us it’s not clear that’s what’s happening. It’s speculation and one possibility.
It may also be a secondary consideration and not the primary gating factor.
Anthropic has had their missteps but it’s still plausible to take what they say at face value.
Two things of note: 5.5-Cyber is likely to be substantially cheaper than Mythos, given it is priced around Opus. Additionally: AISI has never tested OpenAI’s best public model and actual Mythos competitor: 5.5-Pro.
Nor publicly or in my internal reasoning. I rarely conclude without proof or very intense and clear intuition.
From a strategic PoV it makes sense to check if their model is dangerous, I wouldn't want to have my brand name associated with "NK hacker team find zero day in all linux servers of the web and ..."
GPT-2: https://slate.com/technology/2019/02/openai-gpt2-text-genera...
GPT-3: https://www.itpro.com/technology/artificial-intelligence-ai/...
Yet they’re still the predominate search engine, sadly the concerns of the few don’t interest monopolistic profit seekers without forced regulations, think how airlines are legally required to give refunds for delayed flights, there’s a reason it required legislation
You don’t tie it to “your device”.
You tie it to your security key.
Which is treated like a credit card.
and your extended family, friends, or volunteers can act as social proof to allow you back into your accounts,
if your key burns up, it breaks and you were too cool to provision a backup, etc.
> it breaks and you were too cool to provision a backup
If we're relying on the average person to back things up properly, this idea is doomed from the start.
The average person is relying on the average person, for everything, and I agree, they are doomed from the start.
Tech-related items inclusive.
Just new banks.
Same as people being unafraid of their car key being cloned - because they don’t hand it around the general public.
Project Glasswing is our collaborative effort to secure the world’s most important software. In early April, we announced that roughly 50 initial partners had access to Claude Mythos Preview, and since then, they’ve been deploying the model to scan their codebases for vulnerabilities. We recently described how these partners have so far found more than 10,000 high- or critical-severity security flaws.
We’re now expanding Project Glasswing. Following several weeks of close collaboration with our Project Glasswing partners, the security industry, open-source software maintainers, and the US government, we’re extending the partnership to approximately 150 new organizations. Each one will need to meet our security requirements before they gain access.
The organizations in this new group are based in more than 15 countries, and most provide critical infrastructure to many more. (In the future, we intend to expand our geographical reach much further.) The group covers several industries that weren’t well represented in our initial cohort, such as power, water, healthcare, communications, and hardware. And many of the new partners are vendors—companies or nonprofits that maintain codebases that are relied upon by lots of other organizations around the world, including governments.
What each partner has in common is that a successful attack on their codebase could be catastrophic. For most partners, we estimate that a major attack could affect more than 100 million people, with important ramifications for both global and national security.
This expansion is the next step toward our long-term goals: for AI to make all software more secure, and for us to help the industry adjust to how AI could change many of the core assumptions of cybersecurity.
Project Glasswing and the capabilities of Claude Mythos Preview have sparked broad conversations—both within the software industry and with governments—about how AI is changing cybersecurity. These conversations have informed how we’ve expanded the program. They’ve also shaped our thinking about the very purpose of Project Glasswing.
Cheap, fast AI models with powerful cyber capabilities are around the corner. We want Project Glasswing to spur institutions toward operating norms that reflect this reality.
Mythos Preview continues a long-term trend that we’ve been warning about for some time: within 6 to 12 months, we expect that many other AI companies will have Mythos-class models, and they could release them without safeguards that prevent misuse. In that world, cyberattacks could occur much more often, and in much more unpredictable forms. It’s imperative that cyberdefenders adapt to maintain pace.
We see our role as twofold. First, to help the software industry adapt by safely providing wide access to better models, tools, and common infrastructure. Second, to steadily shift the support we provide, from finding vulnerabilities to disclosing, fixing, and deploying patched software. We’ll now discuss each of these in turn.
So far, companies, nonprofits, maintainers, and researchers have acted quickly. Within the first weeks of Project Glasswing, each member began using Mythos Preview at large scale, sharing information and best practices with other partners, and working with third parties to triage the model’s findings. These organizations’ methods for adapting to new tools can, and should, be replicated widely across the millions of organizations and developers who are vulnerable to cyberattacks.
To support this, we recently released Claude Security, a product that uses our latest public frontier models, like Claude Opus 4.8, to scan codebases and suggest patches. We're also releasing—on request, to trusted security teams—the tools we developed to help Project Glasswing’s partners find vulnerabilities more quickly.
We intend to go much further: our longer-term aim is to support the industry in creating new initiatives, standards, and infrastructure for the era of powerful cyber models.
As we’ve previously discussed, the bottleneck in cybersecurity is now verifying, disclosing, and patching the large numbers of vulnerabilities that Mythos-class models can surface.
Mythos Preview itself can help. Many of Project Glasswing’s partners now use the model to write patches, as well as for pre-release checks that prevent vulnerabilities from appearing in the first place. Models like Mythos Preview can also be used for penetration testing (simulating a cyberattack to identify how vulnerabilities might be exploited), automating threat detection and response, and rebuilding legacy codebases in memory-safe languages, among many other defensive tasks.
We’re in discussions with third parties about how we might substantially scale up the reviewing and patching of vulnerabilities in open-source software. We’re also working on sharing ideas and best practices for disclosing vulnerabilities to open-source maintainers, with the intent of making these reports easier to triage and to act upon.
To address the scale of this coming challenge, hundreds of thousands of organizations, researchers, and maintainers will likely need access to the most advanced cyber capabilities and tools available.
We’re working as quickly as we can to safely release Mythos-level capabilities in general access. To do so, we’ll need highly robust safeguards that prevent the model’s cyber capabilities from being misused—safeguards that we (and, to our knowledge, all other AI developers) have yet to develop. Because cybersecurity has both helpful and destructive uses, making safeguards that are both strong and precise enough is a major challenge.
In the meantime, we plan to expand Project Glasswing even further—prioritizing additional essential infrastructure providers, maintainers of critical open-source software, and safety testers. We intend for future expansions to cover organizations in the US and overseas, just as this one does. We also intend to scale up our Cyber Verification Program, which would grant Mythos-class capabilities to many more organizations for specific cyberdefense tasks.
In the future, frontier model releases will become increasingly high-stakes. Capabilities will continue to improve across all domains, including many that—like cybersecurity—can empower attackers and defenders alike. This will not be the last time we need to confront a challenge like this one. But Project Glasswing has taught us a great deal about how to respond when models cross important capability thresholds. If we’re successful, we hope to enable a permanent advantage for defenders.
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission
Anthropic has raised $65 billion in Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital.
An upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.