I think you're spot on with the fact that it's so far it's been either all or nothing. You either give an agent a lot of access and it's really powerful but proportionally dangerous or you lock it down so much that it's no longer useful.
I like a lot of the ideas you show here, but I also worry that LLM-as-a-judge is fundamentally a probabilistic guardrail that is inherently limited. How do you see this? It feels dangerous to rely on a security system that's not based on hard limitations but rather probabilities?
not adding LLM layers to stuff to make them inherently less secure.
This will be a neat concept for the types of tools that come after the present iteration of LLMs.
Unless I’m sorely mistaken.
If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.
Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."
The problem is, 99% secure is a failing grade.
// The policy is embedded as a JSON-escaped value inside a structured JSON object.
// This prevents prompt injection via policy content — any special characters,
// delimiters, or instruction-like text in the policy are safely escaped by
// json.Marshal rather than concatenated as raw text.The question edf13 pointed at but didn’t develop; where does a transport-layer judge earn its place at all? Not as the enforcement layer but as the audit layer on top of one. Kernel-level controls tell you what the agent did. A proxy tells you what the agent tried to exfiltrate and where to.
Structured-JSON escaping and header caps are good tools for the detection job. They’re the wrong tools for the prevention job. Different layers, different questions.
I have an issue with security layers that are inherently nondeterministic. You can't really reason strongly about what this tool provides as part of a security model.
But also, it's in an area where real security seems extremely hard. I think at some point everyone will have a situation where they wanna give an agent some private information and access to the web. You just can't do that in a way that's deterministically safe. But if there are usecase where making it probabilistically safer is enough to tip the balance, well, fine.
Brex LLC is a wholly owned subsidiary of Capital One, N.A.
Brex LLC | 650 S 500W Suite 300 | Salt Lake City, UT 84101
The Brex business account consists of Checking, a commercial checking account provided by Column N.A., Member FDIC (an unaffiliated institution), and Treasury and Vault, cash management services provided by Brex Treasury LLC, Member FINRA/SIPC and a Capital One company.
Securities are offered through Brex Treasury LLC. Funds in Treasury are not FDIC-insured. Funds in Vault at program banks are eligible for FDIC insurance. Funds are not FDIC-insured until they arrive at program banks. Conditions apply.
Investing in securities involves risk and loss of money. Yield and return are variable and fluctuate. Past performance is not a guarantee of future results. This is not an offer to, or implied offer, or a solicitation to, buy or sell any securities. Brex Treasury LLC does not provide legal, tax, or investment advice. The latest statement of financial condition for Brex Treasury LLC is available here. You could lose money by investing in the Fund. Although the Fund seeks to preserve the value of your investment at $1.00 per share, it cannot guarantee it will do so. An investment in the Fund is not insured or guaranteed by the FDIC or any other government agency.
The Brex Mastercard® Corporate Credit Card is issued by Emigrant Bank, Fifth Third Bank N.A., or Airwallex (Netherlands) B.V. (all unaffiliated institutions), pursuant to licenses by Mastercard International Inc. Mastercard is a registered trademark, and the circles design is a trademark of Mastercard International Inc. The Brex Commercial Card is issued by Sutton Bank (an unaffiliated institution), pursuant to a license from Visa® U.S.A. Inc. Can be used where Visa® cards are accepted. No ATM access. All loans are subject to approval, including underwriting, credit, and collateral approval, as well as availability restrictions. Nothing herein should be construed as a commitment to lend.
Certain payment services are provided by Brex Payments LLC, a licensed money transmitter (NMLS #2035354) and a Capital One company.
Some Brex products have associated fees. Plans start at $0 per user, per month, and more advanced features are available for $12 per user, per month.