CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Comments like this don't fill me with confidence: https://github.com/brexhq/CrabTrap/blob/4fbbda9ca00055c1554a...

  // The policy is embedded as a JSON-escaped value inside a structured JSON object.
  // This prevents prompt injection via policy content — any special characters,
  // delimiters, or instruction-like text in the policy are safely escaped by
  // json.Marshal rather than concatenated as raw text.

Really cool! I'm also building something in this space but taking a slightly different approach. I'm glad to see more focus on security for production agentic workflows though, as I think we don't talk about it enough when it comes to claws and other autonomous agents.

I think you're spot on with the fact that it's so far it's been either all or nothing. You either give an agent a lot of access and it's really powerful but proportionally dangerous or you lock it down so much that it's no longer useful.

I like a lot of the ideas you show here, but I also worry that LLM-as-a-judge is fundamentally a probabilistic guardrail that is inherently limited. How do you see this? It feels dangerous to rely on a security system that's not based on hard limitations but rather probabilities?

It's all fine until OpenClaw decides to start prompt injecting the judge

> pointing it at a few days of real traffic produced policies that matched human judgment on the vast majority of held-out requests.

The problem is, 99% secure is a failing grade.

Needs to be deterministic. ACLs

The thread has converged on “LLM-as-judge is the wrong security primitive,” which is right as far as it goes. The prompt-injection chain ends at the outbound POST. By the time the judge sees the request, the credential has already been read.

The question edf13 pointed at but didn’t develop; where does a transport-layer judge earn its place at all? Not as the enforcement layer but as the audit layer on top of one. Kernel-level controls tell you what the agent did. A proxy tells you what the agent tried to exfiltrate and where to.

Structured-JSON escaping and header caps are good tools for the detection job. They’re the wrong tools for the prevention job. Different layers, different questions.

The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?

If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.

Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."

Non-deterministic business rules engine.

So cool ! I'm building something very close to that but from another perspective, making this open source is giving me many idea !

Blatant “astroturfing” in these comments

We’re supposed to be fixing LLM security by adding a non-LLM layer to it,

not adding LLM layers to stuff to make them inherently less secure.

This will be a neat concept for the types of tools that come after the present iteration of LLMs.

Unless I’m sorely mistaken.

We’re supposed to be fixing LLM security by adding a non-LLM layer to it,

not adding LLM layers to stuff to make them inherently less secure.

This will be a neat concept for the types of tools that come after the present iteration of LLMs.

Unless I’m sorely mistaken.

Defense in depth. Layers don't inherently make something less secure. Often, they make it more secure.

Non-deterministic business rules engine.

So cool ! I'm building something very close to that but from another perspective, making this open source is giving me many idea !

The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?

> pointing it at a few days of real traffic produced policies that matched human judgment on the vast majority of held-out requests.

The problem is, 99% secure is a failing grade.

99% is usually the best you can do. So you can only layer multiple defences together, this makes sense as one layer to me.

I have an issue with security layers that are inherently nondeterministic. You can't really reason strongly about what this tool provides as part of a security model.

But also, it's in an area where real security seems extremely hard. I think at some point everyone will have a situation where they wanna give an agent some private information and access to the web. You just can't do that in a way that's deterministically safe. But if there are usecase where making it probabilistically safer is enough to tip the balance, well, fine.

Comments like this don't fill me with confidence: https://github.com/brexhq/CrabTrap/blob/4fbbda9ca00055c1554a...

  // The policy is embedded as a JSON-escaped value inside a structured JSON object.
  // This prevents prompt injection via policy content — any special characters,
  // delimiters, or instruction-like text in the policy are safely escaped by
  // json.Marshal rather than concatenated as raw text.

Blatant “astroturfing” in these comments

Structured-JSON escaping and header caps are good tools for the detection job. They’re the wrong tools for the prevention job. Different layers, different questions.

Needs to be deterministic. ACLs

Yes, full stop. They say they cap the body to 16k and give the LLM a warning, lol. And this is coming from a credit card company.

It's all fine until OpenClaw decides to start prompt injecting the judge

Exactly; would probably be safer with a purely algorithmic decision making system.

Calling it now. Show HN: Pincer - A small highly optimized local model to detect prompt injection attempts against other models.

99% is usually the best you can do. So you can only layer multiple defences together, this makes sense as one layer to me.

I have an issue with security layers that are inherently nondeterministic. You can't really reason strongly about what this tool provides as part of a security model.

Exactly; would probably be safer with a purely algorithmic decision making system.

Yes, full stop. They say they cap the body to 16k and give the LLM a warning, lol. And this is coming from a credit card company.

Calling it now. Show HN: Pincer - A small highly optimized local model to detect prompt injection attempts against other models.

Sounds like a good idea. Please send me the Github link once done and I'll have my OpenClaw take a look and form my opinion of it.

Sounds like a good idea. Please send me you GitHub now and I'll have my big claw crush your open claw

Brex LLC is a wholly owned subsidiary of Capital One, N.A.

Brex LLC | 650 S 500W Suite 300 | Salt Lake City, UT 84101

The Brex business account consists of Checking, a commercial checking account provided by Column N.A., Member FDIC (an unaffiliated institution), and Treasury and Vault, cash management services provided by Brex Treasury LLC, Member FINRA/SIPC and a Capital One company.

Securities are offered through Brex Treasury LLC. Funds in Treasury are not FDIC-insured. Funds in Vault at program banks are eligible for FDIC insurance. Funds are not FDIC-insured until they arrive at program banks. Conditions apply.

Investing in securities involves risk and loss of money. Yield and return are variable and fluctuate. Past performance is not a guarantee of future results. This is not an offer to, or implied offer, or a solicitation to, buy or sell any securities. Brex Treasury LLC does not provide legal, tax, or investment advice. The latest statement of financial condition for Brex Treasury LLC is available here. You could lose money by investing in the Fund. Although the Fund seeks to preserve the value of your investment at $1.00 per share, it cannot guarantee it will do so. An investment in the Fund is not insured or guaranteed by the FDIC or any other government agency.

The Brex Mastercard® Corporate Credit Card is issued by Emigrant Bank, Fifth Third Bank N.A., or Airwallex (Netherlands) B.V. (all unaffiliated institutions), pursuant to licenses by Mastercard International Inc. Mastercard is a registered trademark, and the circles design is a trademark of Mastercard International Inc. The Brex Commercial Card is issued by Sutton Bank (an unaffiliated institution), pursuant to a license from Visa® U.S.A. Inc. Can be used where Visa® cards are accepted. No ATM access. All loans are subject to approval, including underwriting, credit, and collateral approval, as well as availability restrictions. Nothing herein should be construed as a commitment to lend.

Certain payment services are provided by Brex Payments LLC, a licensed money transmitter (NMLS #2035354) and a Capital One company.

Some Brex products have associated fees. Plans start at $0 per user, per month, and more advanced features are available for $12 per user, per month.

Hacker Times

Hacker Times

CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Discussion

Discussion