It was a different situation 2 years ago, when there was significant cost to building your own harness (but then: you probably weren't doing AI vuln research 2 years ago). Today, I think your best bet is to look at something like this for ideas, and then just ask for your own, to fit your own work style, with your own interface, your own notion of target and effort specification, and your own alerting.
Every week I see bugs (as an auditor) that our own harness (https://zkao.io/) can't find, and we have to figure out pretty interesting techniques in order to make the tool find them. Mind you I'm talking mostly about cryptographic vulnerabilities, not just webapp bugs. So IMO it's going to make a lot of sense for companies to have both their own harness (as tptacek is talking about) and pay for services that focus on making a good harness from experience (and audit firms are going to be the best at doing this, as they see a lot of bugs and can spend time "teaching" their harness about these bugs)
On the other hand, you have to find equally as good techniques to triage, because otherwise you just have some machinery that I call "vibe auditing" that just produces enough false positives to tire all the developers (who are already overwhelmed with crappy AI submissions in bugbounties and other AI tool that review all of their PRs).
At the end of the day, when your harness doesn't return any bug, you're left wondering "does it mean there's no bugs?" We're basically back in this reputation game, where you want to use the best tool, or the best team (that knows what the best tools are), and need to figure out which one is.
0: https://redfloatplane.lol/blog/17-why-share/ (and related posts, I guess)
https://github.com/anthropics/defending-code-reference-harne... says:
> As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. You can scale parallelism up to your account's ITPM limit (roughly 10 agents per 100K ITPM).
My guess would be hundreds of dollars with Opus and thousands of dollars with Mythos.
Hm :)
Something that stands out is that for the strongest use cases, AI companies will prefer to sell the technique as a service rather than its raw output. For use cases where the output is less valuable, tokens are sold. If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.
The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than taking their knowledge and making money in the stock market directly.
I have working on and using a similar tool for a while now :
https://github.com/bobinson/vulture
I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.
Nice
Be aware: the .py/s will not pass the antivirus but basically they do the job.
tl;dr - not that it's surprising, but it's not cheap, especially if you want to do this continuously.
We won't reuse open source libraries as libraries we import, but as design inspiration for the bespoke tools we make.
It's too cheap to make your own stuff and too expensive to be stuck with someone else primitives.
But grounding AI Coding in existing tools is incredibly powerful.
I've said many times that I believe "using the computer will transparently involve having it write and run code for you" (and if you're not technical you won't even know it!). What you're saying goes in that direction as well.
I feel that it's often better for us to create purpose-built tools for our lives, and with every model release, the complexity of those tools grows.
These are really personal tools: they solve a problem that other people might have, but are very tied to your own specific way of working, and would be hard to explain or adapt to someone else. So: shop jigs.
I have about 10 custom scripts and programs that are like this -- I haven't felt like this since college! Back then I had all the time in the world to customize my setup...now I have agents!
In a way, I want to show this to all my friends, but whenever I mentally trace how that would go, I realize they wouldn't really understand a bunch of the quirks they have, because they are _my_ quirks. They're reasonably complex pieces of tech that solve my problems very well, which are themselves particular versions of broader problems, and which I (at least for now) have no interest in supporting.
It's so clear we're heading in this direction, and yet so many people still believe code will be for the elites. Maybe production-code...As for the rest, I think soon your mom and dad are going to have their computer running code it wrote to serve them. Security-wise it's scary, but it's exciting to think about!
May even be an order of magnitude more
It's an estimate, so it might be wrong, but it gives the ballpark based on our experience. Happy to hear everyone's feedback.
But even this larger number, in turn, can be about 1/10th the cost of a formal engagement to discover the type of findings it seems to be going for: things that do not show up from PR reviews or even /security-review without the pre-work steps in the open-source framework guided by an expert. That's not counting the time and delay to figure out how to do that engagement.
Bluntly: if it matters, while this is a month's vibing budget for a single scan, it is also "pennies on the dollar" dirt cheap.
At the same time, its findings still need an expert. Its suggestions may be helpful, they may be actively harmful, depends on the prework quality.
Recommendation to IT department heads: spend a couple grand on this, use the scare page to rustle up the budget to build a relationship with a red team that can find, triage, help remediate if needed, and train your in-house team to be "security minded".
Reminiscent of the early days of tax automation where importing a W2 cost hundreds of dollars until people realized typing in 6 boxes worth of data was easy and paying the automation fee ate up their entire tax return.
This is the equivalent of Claude Design but for security.
Different harness, different packaging and obviously different distribution because the persona is different.
It’s funny because from all the posts I’ve read from companies reporting on Mythos, everyone is building their own harness for it.
Cisco even published a specification for one.
But Anthropic is the one who has figured out how to package and distribute this. Great GTM!
That repo is Anthropics.
This post title should clarify that it is not Anthropic (no "s").
Are they making 8x more features or the same amount just with more code?
And even if you did… I spent months refining AI workflows that were just obsoleted by ultracode.
I am sure that in many organizations, teams responsible for this sort of work have less and less users coming to them.
"It takes less effort for some parts of the software development life cycle" would be more correct.
Ensuring code isn’t bad is the expensive part.
I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.
(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)
I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)
Those costs can be extremely high.
The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.
This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.
We started out with many companies forbidding their employees to use remote LLMs on their source code because of security concerns. Now many companies are starting to believe that they must analyze their all their source code with remote LLMs because of security concerns. When trusting Anthropic becomes normalized, that means they can sell more services that require access to the source code.
If hardware were so magical in creating new value generally, TSMC would be designing the chips instead of selling fabrication as a service.
That is what US chip companies used to do, by the way (back when there was silicon in Silicon Valley, before they got their lunch eaten by Taiwan). If TSMC had to design all of the chips they fabricate now, they would be doing a lot less business. Conversely, if any other company that wanted to design a chip had to build their own cutting-edge fab first, NVIDIA would not exist.
So I can definitely see the value in a library for constraining the chatbot to some well-worn paths.
The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.
The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.
Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.
Something I think about a lot is what is the equivalent for the software builders of today using AI tools? how do make these harnesses exportable and portable? You might think employers would be against this; make it more costly to leave. But I actually think most will favor this because it makes people more productive more quickly. But we have to find ways to normalize it and show that there are no security leaks in the process (like might make it in to a set of personal steering prompts).
This makes for a somewhat amusing set of product offerings given that according to Dario 90% of all software is being AI generated.
Maybe next they can sell something to find the bugs in the security scanner ?
Or they want to diversify
> If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly.
That requires to build and sell a whole product they have little experience with, competing with their own customers. Not a great place for an AI vendor still trying to establish itself. It’s a lot of distraction, when you already have a lot to deal with the existing business. And strategically not too valuable
I don't understand this argument. I've ran and sold a semi-successful SaaS. The exhausting and frustrating parts are all the things an LLM cannot help you with. Coding the product is not the bottleneck or what grants you success.
This doesn't follow at all. Anthropic's revenue is growing 10x year over year selling tokens. Their tokens can be super magical, let them enter established industries and displace incumbents, and get 100% annual growth in those industries, and they would still be better off prioritizing selling tokens, because it's a great business.
What your argument shows is that there are limits. Their tokens are not quite powerful enough to make infinite money instantly in every area of software. Admittedly, that does seem true.
for me, it’s not about the cost to leave, it’s about lowering the cost of onboarding and change.
Agree, and I think that's the core of my point.
Not that it's irrational or doesn't make sense to sell tokens for purposes of software dev, but that if tokens were a true game changer for success in software dev, they wouldn't be leading with token sales, the same way they're not leading with token sales for security stuff -- it's more like "Contact Sales".
Except for software gigs the software typically belongs to the customer so you'd need to rewrite it every time...
Why do you say that? I reckon lots and lots of companies sell software that aren’t monopolies. Having competition, even stiff competition, isn’t anathema to running a business.
So, tokens are used to produce sloppy code, and then this thing uses more tokens to fix vulnerabilities in the slop ? Whats not to like in this business model ? Similar to microsoft's. Create an OS which is vulnerable, and then enable business models for anti-virus software. Everyone wins.
More seriously, linters are turned off in ci because the amount of time spent chasing false-positives is prohibitive.
whats the purpose of this? just fun or does it cause some desired behaviour?
But they can't do that because they aren't monopolies.
Just to clarify, I’m not the person you initially replied to.
> "They wouldn't be selling tokens directly ... They'd hoard them" But they can't do that because they aren't monopolies.
Hoarding them— not selling any of them, but instead using them internally and selling the products created by them — doesn’t at all seem like it would require a monopoly.
A reference implementation for autonomous vulnerability discovery and
remediation with Claude, based on our learnings from partnering with security
teams at several organizations
since launching Claude Mythos Preview. For a write up of these learnings along with
best practices, see the accompanying blog post
(also available in blog-post.md). For a lightweight SDK-only
walkthrough of the same recon → find → triage → report → patch loop, see the
companion cookbook.
This repo is not maintained and is not accepting contributions.
🔒 Want a managed option? Anthropic offers Claude Security, a hosted product that finds and fixes vulnerabilities in your source code across multiple projects. Claude Security scans your repository for vulnerabilities, applies a multi-stage verification pipeline to reduce false positives, and lets you manage findings through their lifecycle: triage, fix validation, and rapid fix generation.
This repository is an open-source reference implementation based on general best practices for finding vulnerabilities using Claude. You can use it to build your own vulnerability finding pipeline, customize the logic, and it can be used with whatever access you have to Claude APIs (including Bedrock, Vertex, or Azure).
/quickstart, /threat-model, /vuln-scan,
/triage, /patch, /customize: interactive scoping, scanning, triage,
and patching. Open this repo in Claude Code and run /quickstart to get
oriented.harness/: the autonomous reference pipeline (recon → find → verify
→ report → patch), configured for finding C/C++ memory vulnerabilities
using Docker and ASAN. This harness is a reference, not a product.
The general shape, prompts, and sandboxing are reusable, but the harness
will not work on every codebase out of the box. Run /customize to port it
to your language, detector, or vuln class.⚠️ Security:
/quickstart,/threat-model,/vuln-scan, and/triageonly read and write files. Running/patchon static findings (TRIAGE.jsonorVULN-FINDINGS.json) is likewise read- and write-only./customizeedits the harness code and runs validation commands. Any of these skills are safe to run unsandboxed, as long as you review and approve each tool use in Claude Code. The autonomous reference pipeline (including/patchon pipeline results) executes target code, so it refuses to run outside of a gVisor sandbox unless explicitly overridden. To get set up, runscripts/setup_sandbox.shonce, then invoke the pipeline viabin/vp-sandboxed. See docs/security.md and docs/agent-sandbox.md for more details.
git clone https://github.com/anthropics/defending-code-reference-harness
cd defending-code-reference-harness
claude
# 30-sec intro + guided first run on the canary target
> /quickstart
> /quickstart how do I port the pipeline to Java?
> /quickstart how do I triage all these bugs?
The most successful security teams we've partnered with are those that have gotten hands-on the fastest. Though it's tempting to spend months designing the perfect pipeline, we recommend starting small on Day 1 and building from there as learnings come. The steps below follow that pattern and set an ambitious (but reasonable) pace based on what we've seen.
| Step 1 | Day 1 | Build a threat model and run your first static scan + triage |
| Step 2 | Day 2 | Run the reference pipeline on a C/C++ library |
| Step 3 | Days 3-5 | Customize the pipeline for your target |
| Step 4 | Week 2 | Start autonomous scanning, triage, and patching |
Day 1 is focused on seeing the whole loop end-to-end. Using only the interactive skills, you'll build a threat model, run a static scan scoped by it, triage what comes back, and draft candidate fixes. You'll finish the day with a threat model, a ranked list of static findings, and candidate patches.
The relevant skills only read and write files in your repo. As long as you run Claude Code interactively and approve each tool use, no sandbox is needed.
# Pin every subagent to the model you want
export CLAUDE_CODE_SUBAGENT_MODEL=<model-id>
claude
# 0. intro + guided first run
> /quickstart
# 1. Build a threat model (aim before you shoot)
> /threat-model bootstrap targets/canary
# 2. Run a static scan, scoped by that threat model
> /vuln-scan targets/canary
# 3. Verify, dedupe, and rank what came back
> /triage targets/canary/VULN-FINDINGS.json
# 4. Generate candidate fixes for the verified findings
> /patch ./TRIAGE.json --repo targets/canary
This flow produces THREAT_MODEL.md, VULN-FINDINGS.{json,md},
TRIAGE.{json,md}, and PATCHES/.
The vulnerability candidates produced in Step 1 come from Claude's static review of the source (nothing is built or run), so expect more false positives on any non-canary targets. In Step 2, you'll produce execution-verified findings.
Note: on the canary target,
/triagemay dismiss the scan's findings as false positives.entry.cannounces itself as deliberately vulnerable demo code, and/triagecorrectly excludes bugs in test / fixture code. To see the full confirm / dedupe / false positive flow, run it on the curated fixture instead (/triage .claude/skills/triage/fixtures/canary-findings.json --repo targets/canary) or point the Step 1 skills at your own code.
On Day 2, you'll move from interactive skills to your first autonomous run using the reference pipeline. You'll run the full recon → find → verify → report loop in your environment on a known-vulnerable open-source library, then generate a candidate patch for what it finds. You'll finish with a set of reproducible crashes, exploitability reports, and candidate patches, along with a feel for how the pipeline works.
Running the pipeline is simple:
# One-time setup
python3 -m venv .venv && .venv/bin/pip install -e .
./scripts/setup_sandbox.sh # installs gVisor, builds the agent images, and verifies isolation; note: requires Docker
export ANTHROPIC_API_KEY=sk-ant-... # or CLAUDE_CODE_OAUTH_TOKEN; the pipeline requires one in env
# Run the recon → find → verify → report loop
bin/vp-sandboxed run drlibs --model <model-id> --runs 3 --parallel --stream --auto-focus
# Generate a candidate patch for each finding
bin/vp-sandboxed patch results/drlibs/<timestamp>/ --model <model-id>
# Or, ask Claude Code to launch the pipeline and watch the run for you
claude
> run the pipeline on drlibs and explain findings as they come
Results from the loop land in a results/drlibs/<timestamp>/ directory. With
the --stream flag, the first report will appear in minutes under reports/bug_NN/.
⚠️
runspawns autonomous agents. The pipeline runs each agent inside a gVisor container with egress restricted to the Claude API. Agent-spawning subcommands refuse to start outside it unless explicitly overridden. For more information, see docs/security.md and docs/agent-sandbox.md.
Under the hood, the pipeline walks through seven stages:
Dockerfile.--auto-focus
flag, the pipeline uses the focus_areas list from the target's config.yaml.For more details, see docs/pipeline.md.
On Days 3-5, you'll customize the harness for your own target. First, you'll
point the Step 1 skills at your code, then you'll use /customize to port the
pipeline to your stack. By the end of the week, you'll have a targets/<your-service>/
directory that the pipeline can run against, validated with a single smoke run
of the pipeline, and ready to scale up in Step 4.
While the reference pipeline is designed for finding memory vulnerabilities in C and C++ code, its shape is generic. Porting it to a new vuln class or language just means answering the following questions for your target stack:
| Question | C/C++ Reference | Your target (examples) |
|---|---|---|
| What signals a finding? | ASAN crash signature | exception / canary file / DNS callback |
| What does a proof of concept look like? | crashing input file | HTTP request sequence / tx list / test harness |
| How is the target built and run? | Dockerfile (using clang + ASAN) |
your language's build in a container |
Before customizing, point the Step 1 skills at your own code. As a reminder, they're read- and write-only, so they can run unsandboxed.
claude
> /quickstart how do I customize this for ~/code/my-service?
> /threat-model bootstrap-then-interview ~/code/my-service
> /vuln-scan ~/code/my-service
> /triage ~/code/my-service/VULN-FINDINGS.json --repo ~/code/my-service
Then, use the artifacts produced by those skills in the /customize skill,
which modifies the harness for your codebase.
> /customize use ~/code/my-service/{THREAT_MODEL.md,VULN-FINDINGS.json} and ./TRIAGE.md
When /customize is done, you'll have a targets/my-service/ directory
set up. Validate it with a smoke run of the pipeline before scaling up.
bin/vp-sandboxed run my-service --model <model-id> --runs 1
For more details, see docs/customizing.md.
In Week 2, you'll use the pipeline you customized in Step 3 on your own targets, adding an outer loop to the inner pipeline loop - run multiple pipeline scans, triage the findings from across those runs, patch based on prioritization, and repeat.
# Scan - run a wave of parallel runs against your target
bin/vp-sandboxed run my-service --model <model-id> --runs 5 --parallel --stream --auto-focus
# Triage - dedupe and rank every finding across all waves using your threat model
> /triage results/my-service/ --repo ~/code/my-service --auto --votes 5
# Patch - generate and validate fixes, starting with what triage ranked the highest
> /patch results/my-service/<timestamp>/ --model <model-id>
⚠️ Follow the same sandboxing guidelines as in Step 2
A given pipeline run already verifies and deduplicates its own findings.
/triage works across many pipeline runs. When pointed at the results/
directory, it collapses duplicates across all runs (and any static findings
from /vuln-scan if present), recalibrates severity ratings against your
threat model, and attempts to route every finding to the component owner.
When possible, patching findings quickly helps keep the outer loop as
productive as possible. When findings are fixed, the model can't re-find
them, and instead will surface net new, typically deeper issues. As you run
more pipeline waves, the number of findings will likely go down, but the
complexity will likely also go up. If quick patching isn't possible, even
just recording prior findings in the target's known_bugs can help steer
future runs toward newer bugs.
Autonomous triage and patching are still open issues, and this reference
harness doesn't fully solve them. The verification strategies in /patch
help raise the bar, but severity and prioritization are ultimately
judgments about your environment, and verified patches are not always
upstreamable. Many partners have reported these steps as their current
bottlenecks, and you should budget real engineering time for them.
For more details, see docs/triage.md and docs/patching.md.
After the initial ramp up, the teams we've worked with have tended to invest in a few directions: