If it's up to the AI platform to issue limited tokens to users, and it's also the AI platform making the web requests, I'm not understanding the purpose of the cryptography/tokens. Couldn't the platform already limit a user to 100 web requests per hour just with an internal counter?
Cloudflare is helping to develop & eager to implement an open protocol called ARC (Anonymous Rate-Limited Credentials)
What is ARC? You can read the proposal here: https://www.ietf.org/archive/id/draft-yun-cfrg-arc-01.html#n...
But my summary is:
1. You convince a server that you deserve to have 100 tokens (probably by presenting some non-anonymous credentials)
2. You handshake with the server and walk away with 100 untraceable tokens
3. At anytime, you can present the server with a token. The server only knows
a. The token is valid
b. The token has not been previously used
Other details (disclaimer, I am not a cryptographer):- The server has a public + public key for ARC, which is how it knows that it was the one to issue the tokens. It's also how you know that your tokens are in the same pool as everyone else's tokens.
- It seems like there's an option for your 100 tokens to all be 'branded' with some public information. I assume this would be information like "Expires June 2026" or "Token Value: 1 USD", not "User ID 209385"
- The client actually ends up with a key which will generate the 100 tokens in sequence.
- Obviously the number 100 is configurable.
- It seems like there were already schemes to do this, but providing only one token (RFC 9497, RFC 9474) but I'm not sure how popular those were.
If a computer (or “agent” in modern terms) wants to order you a pizza it can technically already do so.
The reason computers currently can’t order us pizza or book us flights isn’t because of a technical limitation, it’s because the pizza place doesn’t want to just sell you a pizza and the airline doesn’t want to just sell you a flight. Instead they have an entire payroll of people whose salaries are derived from wasting human time, more commonly know as “engagement”. In fact those people will get paid regardless if you actually buy anything, so their incentive is often to waste more of your time even if it means trading off an actual purchase.
The “malicious” uses of AI that this very article refers to are mostly just that - computers/AI agents acting on behalf of humans to sidestep the “wasting human time” issue. The fact that agents may issue more requests than a human user is because information is intentionally not being presented to them in a concise, structured manner. If Dominos or Pizza Hut wanted to sell just pizzas tomorrow they can trivially publish an OpenAPI spec for agents to consume, or even collaborate on an HPOP protocol (Hypertext Pizza Ordering Protocol) to which HPOP clients can connect (no LLMs needed even). But they don’t, because wasting human time is the whole point.
So why would any of these companies suddenly opt into this system? Companies that are after actual money and don’t profit from wasting human time are already ready and don’t have to do anything (if an AI agent is already throwing Bitcoin or valid credit card details at you to buy your pizzas, you are fine), and those that do have zero incentive to opt in since they’d be trading off “engagement” for old-school, boring money (who needs that nowadays right?).
I have a credit card, and an agent. I want a pizza.
These credentials do what, exactly? Prevent the pizza place from taking my money? Allow me to order anonymously so they don’t know where to deliver it?
Also, they are security professionals, so when they say anonymous, they don’t mean pseudonymous, so my agent can produce an unlimited number of identities, right? How do they keep the website from correlating time and IP addresses to link my anonymous requests to a pseudonym?
My cynical take is that the pizzeria has to pay cloudflare a few pennies to process the transaction. What am I missing?
I'll explain my understanding.
Consider what problem CAPTCHA aims to solve (abuse) and how that's ineffective in an age of AI agents: it cannot distinguish "bot that is trying to buy a pizza" vs "bot that is trying to spider my site".
I don't understand Cloudflare's solution enough to explain that part.
I'm glad to see research here, because if we don't have innovation solutions, we might end up with microtransactions for browsing.
Then you can go and spend them freely. The credit card company (and maybe ever third parties?) can verify that the tokens are valid, but they can't associate them with a user. Assuming that the credit card company keeps a log, they can also verify that a token has never been used before.
In some sense, it's a light-weight and anonymous block chain.
Similar logic to SMS verification, but actually private.
They have the nickname Crimeflare for a reason. They allow hundreds of thousands of criminals to use their services maliciously and its a huge hassle to report them only to be met with their stance of "we are only routing traffic not hosting it" and they wont remove the most blatant phishing and malicious pages.
Are you confusing their comments about (paraphrased) "horrible but legal" (up to a point) sites like dailystormer, 8chan, and kiwifarms, with actual blatant phishing sites?
I find it very difficult to believe they won't remove sites involved in clear phishing or malware delivery campaigns, if they can verify it themselves or in cooperation with a security team at a company they trust. That's different from sites that are morally repugnant and whose members spew vitriol, but aren't making any particular threats (and even in cases where there are clear and present threats, CF usually seems to prefer to notify law enforcement, and then follow court orders, rather than inject themselves as a 3rd party judge into the proceedings).
They effectively use credentials and cryptography to link the two together in a zero-knowledge type of way. Real issue, although no one is clearly dying for this yet.
Real solution too, but blind credentials and Chaumian signing is equally naive to think it addresses the root issue. Something like Apple will step in to cast a liability shield over all parties and just continue to trap users into the Apple data ecosystem.
The right way to do this is to give the user sovereignty over their identity and usage such that platforms cater to users rather than the middle-men in-between. Harder than what Cloudflare probably wants to truly solve for.
Still, cool article even if a bit lengthy.
Services need the ability to obtain an identifier that:
- Belongs to exactly one real person.
- That a person cannot own more than one of.
- That is unique per-service.
- That cannot be tied to a real-world identity.
- That can be used by the person to optionally disclose attributes like whether they are an adult or not.
Services generally don’t care about knowing your exact identity but being able to ban a person and not have them simply register a new account, and being able to stop people from registering thousands of accounts would go a long way towards wiping out inauthentic and abusive behaviour.
[0] https://news.ycombinator.com/item?id=41709792
[1] https://news.ycombinator.com/item?id=44378709
The ability to “reset” your identity is the underlying hole that enables a vast amount of abuse. It’s possible to have persistent, pseudonymous access to the Internet without disclosing real-world identity. Being able to permanently ban abusers from a service would have a hugely positive effect on the Internet.
1. It is clearly not written with a desire to actually convey information in a concise, helpful way.
2. It is riddled with advertisements for Cloudflare services which bear absolutely no relevance to the topic at hand
3. The actual point of the article (anonymous rate limiting tokens) is pointlessly obscured by an irrelevant use case (AI agents for some reason)
Of course, the second two points seem to be heavily related to the first.
This is barely any better -- in terms of respect for the reader's intelligence and savviness -- than those "Apple just gave ten million users a reason to THROW AWAY THEIR IPHONES" trash articles. Just slop meant to get you to click on links to Cloudflare services and vaguely associate Cloudflare with the "Agentic AI future", with no actual intention whatsoever of creating a quality article.
The interface the user wants is “I pay for and obtain pizza”. The interface the pizzaria wants is “I obtain payment via credit card, and send a pizza to some physical location”.
It doesn’t matter who the agent that orders the pizza is acting on behalf of, or if there is an agent, or if some third party indexed the pizzaria menu, then some anarcho-crypto syndicate based in the White House decided to run an auction, and buy this particular pizza for this particular person.
Oh and also turns out if the data you share is easily collected it can be analyzed and tracked to prove your crimes like price gauging, IP infringement and other unlawful acts - that's not good for business either!
Part of this is the friction required to implement a client for a bespoke API that only one vendor offers, and the even bigger friction of building a standard.
AI and MCP servers might be able to fix this. In turn, companies will have a motivation to offer AI-compatible interfaces because if the only way to order a pizza is through an engagement farm, the AI agent is just going to order the pizza somewhere else.
I know that phrasing it like "large company cloudflare wants to increase internet accountability" will make many people uncomfortable. I think caution is good here. However, I also think that the internet has a real accountability problem that deserves attention. I think that the accountability problem is so bad, that some solution is going to end up getting implemented. That might mean that the most pro-freedom approach is to help design the solution, rather than avoiding the conversation.
Bad ideas:
You're getting lots of bot requests, so you start demanding clients login to view your blog. It's anti-user, anti-privacy, very annoying, readership drops, everyone is sad.
Instead, what if your browser included your government id in every request automatically? Anti-user, anti-privacy, no browser would implement it.
This idea:
But ARC is a middle ground. Subsets of the internet band together (in this case, via cloudflare) and strike a compromise with users. Individual users need to register with cloudflare, and then cloudflare gives you a million tokens per month to request websites. Or some scheme like this. I assume that it would be sufficiently pro-social that the IETF and browsers all agree to it and it's transparent & completely privacy-respecting to normal users.
We already sort of have some accountability: it's "proof of bandwidth" and "proof of multiple unique ip addresses", but that's not well tuned. In fact, IP addresses destroy privacy for most people, while doing very little to stop bot-nets.
Even if pizza hut wanted people to order pizza the most efficiently with no time wasted it would still want it to happen on their own platforms.
Because if people went to all-pizzas.com for their pizza need then each restaurant and chain would depend on them not to screw them up
If they end up as just a pizza-api they have no moat and are trivially replaced by another api and bakery, and will make less money.
This isn’t true about Daily S. They have been actively working towards and expressively proposing a new holocaust for decades now. In what way are they not an existential threat for Jews, or LGBTQ?
Think SMS verification but with cryptographic sorcery to make it private.
Depending on the level of hassle the service may even use SMS verification at setup. SMS verification is typically easy to acquire for as little as a few cents, but if the goal is to prevent millions of rate limited requests a few cents can add up.
This seems like it would just cause the tokens to become a commodity.
The premise is that you're giving out enough for the usage of the large majority of people, but how many do you give out? If you give out enough for the 95th percentile of usage then 5% of people -- i.e. hundreds of millions of people in the world -- won't have enough for their normal usage. Which is the first problem.
Meanwhile 95% of people would then have more tokens than they need, and the tokens would be scarce, so then they would sell the ones they're not using. Which is the second problem. The people who are the most strapped for cash sell all their tokens for a couple bucks but then get locked out of the internet.
The third problem is that the AI companies would be the ones buying them, and since the large majority of people would have more than they need, they wouldn't be that expensive, and then that wouldn't prevent scraping. Unless you turn the scarcity way up and make the first and second problems really bad.
I wonder how long it will take for sellers to realize the war against agents cannot be won and that their compute resources are better spent giving agents a fast path to task completion.
People were searching AOL keywords for things, and will again.
Only now: by asking OpenAI, Anthropic, or a competitor’s agent.
This is precisely what makes food delivery ordering services (GrubHub, UberEats, Deliveroo, etc.) so challenging to operate and maintain. Practically every restaurant accepts orders in a different way, and maintaining custom mechanisms for each one is costly. Restaurant front-of-house technology companies like Toast are helping make them operate alike, but adoption is slow and there are many, many restaurants to tackle.
Exactly one seems hard to implement (some kind of global registry?). I think relaxing this requirement slightly, such that a user could for instance get a small number of different identities by going to different attestors, would be easier to implement while also making for a better balance. That is, I don't want users to be able to trivially make thousands of accounts, but I also don't want websites to be able to entirely prevent privacy throwaway accounts, for a false ban from Google's services to be bound to your soul for life, to be permanently locked out using anything digital because your identifier was compromised by malware and can't be "reset", or so on.
This is generally considered an unsolvable problem when trying to fulfill all of these requirements (cf. sibling post). Most subsets are easy, but not the full list.
Wait I thought web 2.0 was DHTML / client-side scripting and XmlHttpRequest?
Really, they could each do their own bespoke thing as long as they didn't go out of their way to shut out other implementers.
Instant messaging used to work like this until everyone wanted to own their customer bases and lock them in, for the time-wasting aspect
> - That a person cannot own more than one of.
These are mutually exclusive. Especially if you add 'cannot be tied to a real-world identity'.
Another issue is that people will hire (or enslave) others to effectively lend their identifiers, and it's very hard to distinguish between someone "lending" their identifier vs using it for themselves.
I've been thinking about hierarchical management. Roughly, your identifier is managed by your town, which has its own identifier managed by your state, which has its own identifier managed by your government, which has its own identifier managed by a bloc of governments, which has its own identifier managed by an international organization. When you interact with a foreign website and it requests your identity, you forward the request to your town with your personal identifier, your town forwards the request to your state with the town's identifier, and so on. Town "management" means that towns generate, assign, and revoke stolen personal identifiers, and authenticate requests; state "management" means that states generate, assign, and revoke town identifiers, and authenticate requests (not knowing who in the town sent the request); etc.
The idea is to prevent a much more powerful organization, like a state, from persecuting a much less powerful one, like an individual. In the hierarchical system, your town can persecute you: they can refuse to give you an identifier, give yours to someone else, track what sites you visit, etc. But then, especially if you can convince other town members (which ideally happens if you're unjustly persecuted), it's easier for you to confront the town and convince them to change, than it is to confront and convince a large government. Likewise, states can persecute entire towns, but an entire town is better at resisting than an individual, especially if that town allies with other towns. And governments can persecute entire states, and blocs can persecute entire governments, and the international organization can persecute entire blocs, but not the layer below.
In practice, the hierarchy probably needs many more layers; today's "towns" are sometimes big cities, states are much larger than towns, governments and much more powerful than states, etc. so there must be layers in-between for the layer below to effectively challenge the layer above. Assigning layers may be particularly hard because it requires balance, to enable most justified persecutions, e.g. a bloc punishing a government for not taking care of its scam centers, while preventing most unjustified persecutions. And there will inevitably be towns, states, governments, etc. where the majority of citizens are "unjust", and the layer above can only punish them entirely. So yes, hierarchical management still has many flaws, but is there a better alternative?
Another contradiction at play here is that of inovation vs standardisation. Indeed, you could argue that dominoes' website is also a place where thay can inovate (bring your own recipes! delivery by drone! pay with tokens! wtv!) whereas a pizza protocol would slow down or prevent some inovation. And that LLMs are used to circumvent and therefore standardize the process of ordering a pizza (like you had user maintained APIs to query various incompatible banq websites; these days they probably use LLMs as well).
The big national pizza chains don't offer good prices on pizza. They offer bad prices on pizza, and then offer 'deals' that bring prices back down. These deals, generally, exist to steer customers towards buying more or buying higher-margin items (bread sticks, soda, etc).
If you could order pizza through an API, they wouldn't get the chance to upsell you. If it worked for multiple pizza places, it would advantage places who offer better value with their list prices.
I first thought this was just a crypto play with 1 wallet per real person (wasn't a huge fan), but with the proliferation of AI, it makes sense we'll eventually need safeguards to ensure a user's humanity, ideally without any other identifiers needed.
With AI browsers, all they have to do initially is not break them, and long term, each of them can individually choose to offer their API - no coordination required - and gain a slight advantage.
The flak should be because it's from Sam Altman. A billionaire tech bro giving us both the disease and the cure, and profiting massively along the way, is what's truly dystopian.
I don't see how you can prevent multiple people sharing access to one HSM. Also, if the key is the same in hundreds of HSMs, this isn't fulfilled to begin with? Is this assuming the HSM holds multiple keys?
btw: "usually". Can you cite an implementation?
u2f has it: https://security.stackexchange.com/questions/224692/how-does...
>I don't see how you can prevent multiple people sharing access to one HSM.
Obviously that's out of scope unless the HSM has a retina scanner or whatever, but even then there's nothing preventing someone from consensually using their cousin's government issued id (ie. HSM) to access a 18+ site.
> Also, if the key is the same in hundreds of HSMs, this isn't fulfilled to begin with? Is this assuming the HSM holds multiple keys?
The idea is that the HSM will sign arbitrary proofs to give to relying parties. The relying parties can validate the key used to sign the proof is valid through some sort of certificate chain that is ultimately rooted at some government CA. However because the key is shared among hundreds/thousands/tens of thousands of HSMs/ids, it's impossible to tie that to a specific person/id/HSM.
> Is this assuming the HSM holds multiple keys?
Yeah, you'd need a separate device-specific key to sign/generate an identifier that's unique per-service. To summarize:
each HSM contains two keys:
1. K1: device-specific key, specific to the given HSM
2. K2: shared across some large number of HSMs
both keys is resistant to be extracted from the HSM, and the HSM will only use them for signing
To authenticate to a website (relying party):
1. HSM generates id, using something like hmac(site domain name, K1)
2. HSM generate signing blob containing the above id, and whatever additional attributes the user wants to disclose (eg. their name or whether they're 18+) plus timestamp/anti-replay token (or similar), signs it with k2, and returns to the site. The HSM also returns a certificate certifying that K2 is issued by some national government.
The site can verify the response comes from a genuine HSM because the certificate chains to some national government's CA. The site can also be sure that users can't create multiple accounts, because each HSM will generate the same id given the same site. However two sites can't correlate identities because the id changes depending on the site, and the signing key/certificate is shared among a large number of users. Governments can still theoretically deanonymize users if they retain K1 and work with site operators.
2025-10-30
21 min read

The way we interact with the Internet is changing. Not long ago, ordering a pizza meant visiting a website, clicking through menus, and entering your payment details. Soon, you might just ask your phone to order a pizza that matches your preferences. A program on your device or on a remote server, which we call an AI agent, would visit the website and orchestrate the necessary steps on your behalf.
Of course, agents can do much more than order pizza. Soon we might use them to buy concert tickets, plan vacations, or even write, review, and merge pull requests. While some of these tasks will eventually run locally, for now, most are powered by massive AI models running in the biggest datacenters in the world. As agentic AI increases in popularity, we expect to see a large increase in traffic from these AI platforms and a corresponding drop in traffic from more conventional sources (like your phone).
This shift in traffic patterns has prompted us to assess how to keep our customers online and secure in the AI era. On one hand, the nature of requests are changing: Websites optimized for human visitors will have to cope with faster, and potentially greedier, agents. On the other hand, AI platforms may soon become a significant source of attacks, originating from malicious users of the platforms themselves.
Unfortunately, existing tools for managing such (mis)behavior are likely too coarse-grained to manage this transition. For example, when Cloudflare detects that a request is part of a known attack pattern, the best course of action often is to block all subsequent requests from the same source. When the source is an AI agent platform, this could mean inadvertently blocking all users of the same platform, even honest ones who just want to order pizza. We started addressing this problem earlier this year. But as agentic AI grows in popularity, we think the Internet will need more fine-grained mechanisms of managing agents without impacting honest users.
At the same time, we firmly believe that any such security mechanism must be designed with user privacy at its core. In this post, we'll describe how to use anonymous credentials (AC) to build these tools. Anonymous credentials help website operators to enforce a wide range of security policies, like rate-limiting users or blocking a specific malicious user, without ever having to identify any user or track them across requests.
Anonymous credentials are under development at IETF in order to provide a standard that can work across websites, browsers, platforms. It's still in its early stages, but we believe this work will play a critical role in keeping the Internet secure and private in the AI era. We will be contributing to this process as we work towards real-world deployment. This is still early days. If you work in this space, we hope you will follow along and contribute as well.
To help us discuss how AI agents are affecting web servers, let’s build an agent ourselves. Our goal is to have an agent that can order a pizza from a nearby pizzeria. Without an agent, you would open your browser, figure out which pizzeria is nearby, view the menu and make selections, add any extras (double pepperoni), and proceed to checkout with your credit card. With an agent, it’s the same flow —except the agent is opening and orchestrating the browser on your behalf.
In the traditional flow, there’s a human all along the way, and each step has a clear intent: list all pizzerias within 3 Km of my current location; pick a pizza from the menu; enter my credit card; and so on. An agent, on the other hand, has to infer each of these actions from the prompt "order me a pizza."
In this section, we’ll build a simple program that takes a prompt and can make outgoing requests. Here’s an example of a simple Worker that takes a specific prompt and generates an answer accordingly. You can find the code on GitHub:
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const out = await env.AI.run("@cf/meta/llama-3.1-8b-instruct-fp8", {
prompt: `I'd like to order a pepperoni pizza with extra cheese.
Please deliver it to Cloudflare Austin office.
Price should not be more than $20.`,
});
return new Response(out.response);
},
} satisfies ExportedHandler<Env>;
In this context, the LLM provides its best answer. It gives us a plan and instruction, but does not perform the action on our behalf. You and I are able to take a list of instructions and act upon it because we have agency and can affect the world. To allow our agent to interact with more of the world, we’re going to give it control over a web browser.
Cloudflare offers a Browser Rendering service that can bind directly into our Worker. Let’s do that. The following code uses Stagehand, an automation framework that makes it simple to control the browser. We pass it an instance of Cloudflare remote browser, as well as a client for Workers AI.
import { Stagehand } from "@browserbasehq/stagehand";
import { endpointURLString } from "@cloudflare/playwright";
import { WorkersAIClient } from "./workersAIClient"; // wrapper to convert cloudflare AI
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const stagehand = new Stagehand({
env: "LOCAL",
localBrowserLaunchOptions: { cdpUrl: endpointURLString(env.BROWSER) },
llmClient: new WorkersAIClient(env.AI),
verbose: 1,
});
await stagehand.init();
const page = stagehand.page;
await page.goto("https://mini-ai-agent.cloudflareresearch.com/llm");
const { extraction } = await page.extract("what are the pizza available on the menu?");
return new Response(extraction);
},
} satisfies ExportedHandler<Env>;
You can access that code for yourself on https://mini-ai-agent.cloudflareresearch.com/llm. Here’s the response we got on October 10, 2025:
Margherita Classic: $12.99
Pepperoni Supreme: $14.99
Veggie Garden: $13.99
Meat Lovers: $16.99
Hawaiian Paradise: $15.49
Using the screenshot API of browser rendering, we can also inspect what the agent is doing. Here's how the browser renders the page in the example above:

Stagehand allows us to identify components on the page, such as page.act(“Click on pepperoni pizza”) and page.act(“Click on Pay now”). This eases interaction between the developer and the browser.
To go further, and instruct the agent to perform the whole flow autonomously, we have to use the appropriately named agent mode of Stagehand. This feature is not yet supported by Cloudflare Workers, but is provided below for completeness.
import { Stagehand } from "@browserbasehq/stagehand";
import { endpointURLString } from "@cloudflare/playwright";
import { WorkersAIClient } from "./workersAIClient";
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const stagehand = new Stagehand({
env: "LOCAL",
localBrowserLaunchOptions: { cdpUrl: endpointURLString(env.BROWSER) },
llmClient: new WorkersAIClient(env.AI),
verbose: 1,
});
await stagehand.init();
const agent = stagehand.agent();
const result = await agent.execute(`I'd like to order a pepperoni pizza with extra cheese.
Please deliver it to Cloudflare Austin office.
Price should not be more than $20.`);
return new Response(result.message);
},
} satisfies ExportedHandler<Env>;
We can see that instead of adding step-by-step instructions, the agent is provided control. To actually pay, it would need access to a payment method such as a virtual credit card.
The prompt had some subtlety in that we’ve scoped the location to Cloudflare’s Austin office. This is because while the agent responds to us, it needs to understand our context. In this case, the agent operates out of Cloudflare edge, a location remote to us. This implies we are unlikely to pick up a pizza from this data center if it was ever delivered.
The more capabilities we provide to the agent, the more it has the ability to create some disruption. Instead of someone having to make 5 clicks at a slow rate of 1 request per 10 seconds, they’d have a program running in a data center possibly making all 5 requests in a second.
This agent is simple, but now imagine many thousands of these — some benign, some not — running at datacenter speeds. This is the challenge origins will face.
For humans to interact with the online world, they need a web browser and some peripherals with which to direct the behavior of that browser. Agents are another way of directing a browser, so it may be tempting to think that not much is actually changing from the origin's point of view. Indeed, the most obvious change from the origin's point of view is merely where traffic comes from:

The reason this change is significant has to do with the tools the server has to manage traffic. Websites generally try to be as permissive as possible, but they also need to manage finite resources (bandwidth, CPU, memory, storage, and so on). There are a few basic ways to do this:
Global security policy: A server may opt to slow down, CAPTCHA, or even temporarily block requests from all users. This policy may be applied to an entire site, a specific resource, or to requests classified as being part of a known or likely attack pattern. Such mechanisms may be deployed in reaction to an observed spike in traffic, as in a DDoS attack, or in anticipation of a spike in legitimate traffic, as in Waiting Room.
Incentives: Servers sometimes try to incentivize users to use the site when more resources are available. For instance, a server price may be lower depending on the location or request time. This could be implemented with a Cloudflare Snippet.
While both tools can be effective, they also sometimes cause significant collateral damage. For example, while rate limiting a website's login endpoint can help prevent credential stuffing attacks, it also degrades the user experience for non-attackers. Before resorting to such measures, servers will first try to apply the security policy (whether a rate limit, a CAPTCHA, or an outright block) to individual users or groups of users.
However, in order to apply a security policy to individuals, the server needs some way of identifying them. Historically, this has been done via some combination of IP addresses, User-Agent, an account tied to the user identity (if available), and other fingerprints. Like most cloud service providers, Cloudflare has a dedicated offering for per-user rate limits based on such heuristics.
Fingerprinting works for the most part. However, it's unequitably distributed. On mobile, users have an especially difficult time solving CAPTCHAs, when using a VPN they’re more likely to be blocked, and when using reading mode they can mess up their fingerprint, preventing rendering of the page.
Likewise, agentic AI only exacerbates the limitations of fingerprinting. Not only will more traffic be concentrated on a smaller source IP range, the agents themselves will run the same software and hardware platform, making it harder to distinguish honest from malicious users.
Something that could help is Web Bot Auth, which would allow agents to identify to the origin which platform they're operated by. However, we wouldn't want to extend this mechanism — intended for identifying the platform itself — to identifying individual users of the platforms, as this would create an unacceptable privacy risk for these users.
We need some way of implementing security controls for individual users without identifying them. But how? The Privacy Pass protocol provides a partial solution.
Today, one of the most prominent use cases for Privacy Pass is to rate limit requests from a user to an origin, as we have discussed before. The protocol works roughly as follows. The client is issued a number of tokens. Each time it wants to make a request, it redeems one of its tokens to the origin; the origin allows the request through only if the token is fresh, i.e., has never been observed before by the origin.
In order to use Privacy Pass for per-user rate-limiting, it's necessary to limit the number of tokens issued to each user (e.g., 100 tokens per user per hour). To rate limit an AI agent, this role would be fulfilled by the AI platform. To obtain tokens, the user would log in with the platform, and said platform would allow the user to get tokens from the issuer. The AI platform fulfills the attester role in Privacy Pass parlance. The attester is the party guaranteeing the per-user property of the rate limit. The AI platform, as an attester, is incentivized to enforce this token distribution as it stakes its reputation: Should it allow for too many tokens to be issued, the issuer could distrust them.

The issuance and redemption protocols are designed to have two properties:
Tokens are unforgeable: only the issuer can issue valid tokens.
Tokens are unlinkable: no party, including the issuer, attester, or origin, can tell which user a token was issued to.
These properties can be achieved using a cryptographic primitive called a blind signature scheme. In a conventional signature scheme, the signer uses its private key to produce a signature for a message. Later on, a verifier can use the signer’s public key to verify the signature. Blind signature schemes work in the same way, except that the message to be signed is blinded such that the signer doesn't know the message it's signing. The client “blinds” the message to be signed and sends it to the server, which then computes a blinded signature over the blinded message. The client obtains the final signature by unblinding the signature.
This is exactly how the standardised Privacy Pass issuance protocols are defined by RFC 9578:
Blind signatures are simple, cheap, and perfectly suited for many applications. However, they have some limitations that make them unsuitable for our use case.
First, the communication cost of the issuance protocol is too high. For each token issued, the user sends a 256-byte, blinded nullifier and the issuer replies with a 256-byte blind signature (assuming RSA-2048 is used). That's 0.5KB of additional communication per request, or 500KB for every 1,000 requests. This is manageable as we’ve seen in a previous experiment for Privacy Pass, but not ideal. Ideally, the bandwidth would be sublinear in the rate limit we want to enforce. An alternative to blind signatures with lower compute time are Oblivious Pseudorandom Functions (VOPRF), but the bandwidth is still asymptotically linear. We’ve discussed them in the past, as they served as the basis for early deployments of Privacy Pass.
Second, blind signatures can't be used to rate-limit on a per-origin basis. Ideally, when issuing $N$ tokens to the client, the client would be able to redeem at most $N$ tokens at any origin server that can verify the token's validity. However, the client can't safely redeem the same token at more than one server because it would be possible for the servers to link those redemptions to the same client. What's needed is some mechanism for what we'll call late origin-binding: transforming a token for redemption at a particular origin in a way that's unlinkable to other redemptions of the same token.
Third, once a token is issued, it can't be revoked: it remains valid as long as the issuer's public key is valid. This makes it impossible for an origin to block a specific user if it detects an attack, or if its tokens are compromised. The origin can block the offending request, but the user can continue to make requests using its remaining token budget.
As noted by Chaum in 1985, an anonymous credential system allows users to obtain a credential from an issuer, and later prove possession of this credential, in an unlinkable way, without revealing any additional information. Also, it is possible to demonstrate that some attributes are attached to the credential.
One way to think of an anonymous credential is as a kind of blind signature with some additional capabilities: late-binding (link a token to an origin after issuance), multi-show (generate multiple tokens from a single issuer response), and expiration distinct from key rotation (token validity decoupled of the issuer cryptographic key validity). In the redemption flow for Privacy Pass, the client presents the unblinded message and signature to the server. To accept the redemption, the server needs to verify the signature. In an AC system, the client only presents a part of the message. In order for the server to accept the request, the client needs to prove to the server that it knows a valid signature for the entire message without revealing the whole thing.
The flow we described above would therefore include this additional presentation step.

Note that the tokens generated through blind signatures or VOPRFs can only be used once, so they can be regarded as single-use tokens. However, there exists a type of anonymous credentials that allows tokens to be used multiple times. For this to work, the issuer grants a credential to the user, who can later derive at most N many single-use tokens for redemption. Therefore, the user can send multiple requests, at the expense of a single issuance session.
The table below describes how blind signatures and anonymous credentials provide features of interest to rate limiting.
Feature | Blind Signature | Anonymous Credential |
Issuing Cost | Linear complexity: issuing 10 signatures is 10x as expensive as issuing one signature | Sublinear complexity: signing 10 attributes is cheaper than 10 individual signatures |
Proof Capability | Only prove that a message has been signed | Allow efficient proving of partial statements (i.e., attributes) |
State Management | Stateless | Stateful |
Attributes | No attributes | Public (e.g. expiry time) and private state |
Let's see how a simple anonymous credential scheme works. The client's message consists of the pair $(k, C)$, where $k$ is a nullifier and $C$ is a counter representing the remaining number of times the client can access a resource. The value of the counter is controlled by the server: when the client redeems its credential, it presents both the nullifier and the counter. In response, the server checks that signature of the message is valid and that the nullifier is fresh, as before. Additionally, the server also
checks that the counter is greater than zero; and
decrements the counter issuing a new credential for the updated counter and a fresh nullifier.
A blind signature could be used to meet this functionality. However, whereas the nullifier can be blinded as before, it would be necessary to handle the counter in plaintext so that the server can check that the counter is valid (Step 1) and update it (Step 2). This creates an obvious privacy risk since the server, which is in control of the counter, can use it to link multiple presentations by the same client. For example, when you reach out to buy a pepperoni pizza, the origin could assign you a special counter value, which eases fingerprinting when you present it a second time. Fortunately, there exist anonymous credentials designed to close this kind of privacy gap.
The scheme above is a simplified version of Anonymous Credit Tokens (ACT), one of the anonymous credential schemes being considered for adoption by the Privacy Pass working group at IETF. The key feature of ACT is its statefulness: upon successful redemption, the server re-issues a new credential with updated nullifier and counter values. This creates a feedback loop between the client and server that can be used to express a variety of security policies.
By design, it's not possible to present ACT credentials multiple times simultaneously: the first presentation must be completed so that the re-issued credential can be presented in the next request. Parallelism is the key feature of Anonymous Rate-limited Credential (ARC), another scheme under discussion at the Privacy Pass working group. ARCs can be presented across multiple requests in parallel up to the presentation limit determined during issuance.
Another important feature of ARC is its support for late origin-binding: when a client is issued an ARC with presentation limit $N$, it can safely use its credential to present up to $N$ times to any origin that can verify the credential.
These are just examples of relevant features of some anonymous credentials. Some applications may benefit from a subset of them; others may need additional features. Fortunately, both ACT and ARC can be constructed from a small set of cryptographic primitives that can be easily adapted for other purposes.
ARC and ACT share two primitives in common: algebraic MACs, which provide for limited computations on the blinded message; and zero-knowledge proofs (ZKP) for proving validity of the part of the message not revealed to the server. Let's take a closer look at each.
A Message Authenticated Code (MAC) is a cryptographic tag used to verify a message's authenticity (that it comes from the claimed sender) and integrity (that it has not been altered). Algebraic MACs are built from mathematical structures like group actions. The algebraic structure gives them some additional functionality, one of them being a homomorphism that we can blind easily to conceal the actual value of the MAC. Adding a random value on an algebraic MAC blinds the value.
Unlike blind signatures, both ACT and ARC are only privately verifiable, meaning the issuer and the origin must both have the issuer's private key. Taking Cloudflare as an example, this means that a credential issued by Cloudflare can only be redeemed by an origin behind Cloudflare. Publicly verifiable variants of both are possible, but at an additional cost.
Zero knowledge proofs (ZKP) allow us to prove a statement is true without revealing the exact value that makes the statement true. The ZKP is constructed by a prover in such a way that it can only be generated by someone who actually possesses the secret. The verifier can then run a quick mathematical check on this proof. If the check passes, the verifier is convinced that the prover's initial statement is valid. The crucial property is that the proof itself is just data that confirms the statement; it contains no other information that could be used to reconstruct the original secret.
For ARC and ACT, we want to prove linear relations of secrets. In ARC, a user needs to prove that different tokens are linked to the same original secret credential. For example, a user can generate a proof showing that a request token was derived from a valid issued credential. The system can verify this proof to confirm the tokens are legitimately connected, all without ever learning the underlying secret credential that ties them together. This allows the system to validate user actions while guaranteeing their privacy.
Proving simple linear relations can be extended to prove a number of powerful statements, for example that a number is in range. For example, this is useful to prove that you have a positive balance on your account. To prove your balance is positive, you prove that you can encode your balance in binary. Let’s say you can at most have 1024 credits in your account. To prove your balance is non-zero when it is, for example, 12, you prove two things simultaneously: first, that you have a set of binary bits, in this case 12=(1100)2, and second, that a linear equation using these bits (8*1 + 4*1 + 2*0 + 1*0) correctly adds up to your total committed balance. This convinces the verifier that the number is validly constructed without them learning the exact value. This is how it works for powers of two, but it can easily be extended to arbitrary ranges.
The mathematical structure of algebraic MACs allows easy blinding and evaluation. The structure also allows for an easy proof that a MAC has been evaluated with the private key without revealing the MAC. In addition, ARC could use ZKPs to prove that a nonce has not been spent before. In contrast, ACT uses ZKPs to prove we have enough of a balance left on our token. The balance is subtracted homomorphically using more group structure.
Anonymous credentials allow for more flexibility, and have the potential to reduce the communication cost, compared to blind signatures in certain applications. To identify such applications, we need to measure the concrete communication cost of these new protocols. In addition, we need to understand how their CPU usage compares to blind signatures and oblivious pseudorandom functions.
We measure the time that each participant spends at each stage of some AC schemes. We also report the size of messages transmitted across the network. For ARC, ACT, and VOPRF, we'll use ristretto255 as the prime group and SHAKE128 for hashing. For Blind RSA, we'll use a 2048-bit modulus and SHA-384 for hashing.
Each algorithm was implemented in Go, on top of the CIRCL library. We plan to open source the code once the specifications of ARC and ACT begin to stabilize.
Let’s take a look at the most widely used deployment in Privacy Pass: Blind RSA. Redemption time is low, and most of the cost lies with the server at issuance time. Communication cost is mostly constant and in the order of 256 bytes.
| Blind RSA
| RFC9474(RSA-2048+SHA384) | 1 Token |
|---|---|
| Time | Message Size |
| --- | --- |
| Issuance | Client (Blind) |
| Server (Evaluate) | 2.69 ms |
| Client (Finalize) | 37 µs |
| Redemption | Client |
| Server | 37 µs |
When looking at VOPRF, verification time on the server is slightly higher than for Blind RSA, but communication cost and issuance are much faster. Evaluation time on the server is 10x faster for 1 token, and more than 25x faster when using amortized token issuance. Communication cost per token is also more appealing, with a message size at least 3x lower.
| VOPRF
| RFC9497(Ristretto255+SHA512) | 1 Token | 1000 Amortized issuances |
|---|---|---|
| Time | Message Size | Time |
| (per token) | Message Size | |
| (per token) | ||
| --- | --- | --- |
| Issuance | Client (Blind) | 54 µs |
| Server (Evaluate) | 260 µs | 96 B |
| Client (Finalize) | 376 µs | 64 B |
| Redemption | Client | – |
| Server | 57 µs | – |
This makes VOPRF tokens appealing for applications requiring a lot of tokens that can accept a slightly higher redemption cost, and that don’t need public verifiability.
Now, let’s take a look at the figures for ARC and ACT anonymous credential schemes. For both schemes we measure the time to issue a credential that can be presented at most $N=1000$ times.
| Issuance
| Credential Generation | ARC | ACT |
|---|---|---|
| Time | Message Size | Time |
| --- | --- | --- |
| Client (Request) | 323 µs | 224 B |
| Server (Response) | 1349 µs | 448 B |
| Client (Finalize) | 1293 µs | 128 B |
| Redemption | ||
| Credential Presentation | ARC | ACT |
| Time | Message Size | Time |
| Client (Present) | 735 µs | 288 B |
| Server (Verify/Refund) | 740 µs | – |
| Client (Update) | – | – |
As we would hope, the communication cost and the server’s runtime is much lower than a batched issuance with either Blind RSA or VOPRF. For example, a VOPRF issuance of 1000 tokens takes 99 ms (99 µs per token) vs 1.35 ms for issuing one ARC credential that allows for 1000 presentations. This is about 70x faster. The trade-off is that presentation is more expensive, both for the client and server.
How about ACT? Like ARC, we would expect the communication cost of issuance grows much slower with respect to the credits issued. Our implementation bears this out. However, there are some interesting performance differences between ARC and ACT: issuance is much cheaper for ACT than it is for ARC, but redemption is the opposite.
What's going on? The answer has largely to do with what each party needs to prove with ZKPs at each step. For example, during ACT redemption, the client proves to the server (in zero-knowledge) that its counter $C$ is in the desired range, i.e., $0 \leq C \leq N$. The proof size is on the order of $\log_{2} N$, which accounts for the larger message size. In the current version, ARC redemption does not involve range proofs, but a range proof may be added in a future version. Meanwhile, the statements the client and server need to prove during ARC issuance are a bit more complicated than for ARC presentation, which accounts for the difference in runtime there.
The advantage of anonymous credentials, as discussed in the previous sections, is that issuance only has to be performed once. When a server evaluates its cost, it takes into account the cost of all issuances and the cost of all verifications. At present, only accounting for credentials costs, it’s cheaper for a server to issue and verify tokens than verify an anonymous credential presentation.
The advantage of multiple-use anonymous credentials is that instead of the issuer generating $N$ tokens, the bulk of computation is offloaded to the clients. This is more scoped. Late origin binding allows them to work for multiple origins/namespace, range proof to decorrelate expiration from key rotation, and refund to provide a dynamic rate limit. Their current applications are dictated by the limitation of single-use token based schemes, more than by the added efficiency they provide. This seems to be an exciting area to explore, and see if closing the gap is possible.
Managing agents will likely require features from both ARC and ACT.
ARC already has much of the functionality we need: it supports rate limiting, is communication-efficient, and it supports late origin-binding. Its main downside is that, once an ARC credential is issued, it can't be revoked. A malicious user can always make up to N requests to any origin it wants.
We can allow for a limited form of revocation by pairing ARC with blind signatures (or VOPRF). Each presentation of the ARC credential is accompanied by a Privacy Pass token: upon successful presentation, the client is issued another Privacy Pass token it can use during the next presentation. To revoke a credential, the server would simply not re-issue the token:

This scheme is already quite useful. However, it has some important limitations:
Parallel presentation across origins is not possible: the client must wait for the request to one origin to succeed before it can initiate a request to a second origin.
Revocation is global rather than per-origin, meaning the credential is not only revoked for the origin to whom it was presented, but for every origin it can be presented to. We suspect this will be undesirable in some cases. For example, an origin may want to revoke if a request violates its robots.txt policy; but the same request may have been accepted by other origins.
A more fundamental limitation of this design is that the decision to revoke can only be made on the basis of a single request — the one in which the credential was presented. It may be risky to decide to block a user on the basis of a single request; in practice, attack patterns may only emerge across many requests. ACT's statefulness enables at least a rudimentary form of this kind of defense. Consider the following scheme:
Issuance: The client is issued an ARC with presentation limit $N=1$.
Presentation:
When the client presents its ARC credential to an origin for the first time, the server issues an ACT credential with a valid initial state.
When the client presents an ACT with valid state (e.g., credit counter greater than 0), the origin either:
refuses to issue a new ACT, thereby revoking the credential. It would only do so if it had high confidence that the request was part of an attack; or
issues a new ACT with state updated to reduce the ACT credit by the amount of resources consumed while processing the request.
Benign requests wouldn't change the state by much (if at all), but suspicious requests might impact the state in a way that gets the user closer to their rate limit much faster.
To see how this idea works in practice, let's look at a working example that uses the Model Context Protocol. The demo below is built using MCP Tools. Tools are extensions the AI agent can call to extend its capabilities. They don't need to be integrated at release time within the MCP client. This provides a nice and easy prototyping avenue for anonymous credentials.
Tools are offered by the server via an MCP compatible interface. You can see details on how to build such MCP servers in a previous blog.
In our pizza context, this could look like a pizzeria that offers you a voucher. Each voucher gets you 3 pizza slices. Mocking a design, an integration within a chat application could look as follows:


The first panel presents all tools exposed by the MCP server. The second one showcases an interaction performed by the agent calling these tools.
To look into how such a flow would be implemented, let’s write the MCP tools, offer them in an MCP server, and manually orchestrate the calls with the MCP Inspector.
The MCP server should provide two tools:
act-issue which issues an ACT credential valid for 3 requests. The code used here is an earlier version of the IETF draft which has some limitations.
act-redeem makes a presentation of the local credential, and fetches our pizza menu.
First, we run act-issue. At this stage, we could ask the agent to run an OAuth flow, fetch an internal authentication endpoint, or to compute a proof of work.

This gives us 3 credits to spend against an origin. Then, we run act-redeem

Et voilà. If we run act-redeem once more, we see we have one fewer credit.

You can test it yourself, here are the source codes available. The MCP server is written in Rust to integrate with the ACT rust library. The browser-based client works similarly, check it out.
In this post, we’ve presented a concrete approach to rate limit agent traffic. It is in full control of the client, and is built to protect the user's privacy. It uses emerging standards for anonymous credentials, integrates with MCP, and can be readily deployed on Cloudflare Workers.
We're on the right track, but there are still questions that remain. As we touched on before, a notable limitation of both ARC and ACT is that they are only privately verifiable. This means that the issuer and origin need to share a private key, for issuing and verifying the credential respectively. There are likely to be deployment scenarios for which this isn't possible. Fortunately, there may be a path forward for these cases using _pairing-_based cryptography, as in the BBS signature specification making its way through IETF. We’re also exploring post-quantum implications in a concurrent post.
If you are an agent platform, an agent developer, or a browser, all our code is available on GitHub for you to experiment. Cloudflare is actively working on vetting this approach for real-world use cases.
The specification and discussion are happening within the IETF and W3C. This ensures the protocols are built in the open, and receive participation from experts. Improvements are still to be made to clarify the right performance-to-privacy tradeoff, or even the story to deploy on the open web.
If you’d like to help us, we’re hiring 1,111 interns over the course of next year, and have open positions.
Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.
Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.
To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.