> We conducted a non-intrusive security review, simply by browsing like normal users. Within minutes, we discovered a Supabase API key exposed in client-side JavaScript, granting unauthenticated access to the entire production database - including read and write operations on all tables.
In actuality "Antivirus" for AI agents looks something more like this:
1. Input scanning: ML classifiers detect injection patterns (not regex, actual embedding-based detection) 2. Output validation: catch when the model attempts unauthorized actions 3. Privilege separation: the LLM doesn't have direct access to sensitive resources
Is it perfect? No. Neither is SQL parameterization against all injection attacks. But good is better than nothing.
(Disclosure: I've built a prompt protection layer for OpenClaw that I've been using myself and sharing with friends - happy to discuss technical approaches if anyone's curious.)
For security, a dedicated machine (e.g., dedicated Raspberry Pi) with restricted API permissions and limits should help I guess.
Raspberry Pi might have my money if their hardware is more capable in running better models.
Well, yeah. How would you even do a reverse CAPTCHA?
How do you go about telling a person who vibe-coded a project into existence how to fix their security flaws?
The landscape of security was bad long before the metaphorical "unwashed masses" got hold of it. Now its quite alarming as there are waves of non-technical users doing the bare minimum to try and keep up to date with the growing hype.
The security nightmare happening here might end up being more persistant then we realize.
the compounding (aggregating) behavior of agents allowed to interact in environments this becomes important, indeed shall soon become existential (for some definition of "soon"),
to the extent that agents' behavior in our shared world is impact by what transpires there.
--
We can argue and do, about what agents "are" and whether they are parrots (no) or people (not yet).
But that is irrelevant if LLM-agents are (to put it one way) "LARPing," but with the consequence that doing so results in consequences not confined to the site.
I don't need to spell out a list; it's "they could do anything you said YES to, in your AGENT.md" permissions checks.
"How the two characters '-y' ended civilization: a post-mortem"
Feels kinda funny reading an LLM generated article criticizing the security of an LLM generated platform. I mean I'm sure the security vulnerabilities were real, but I really would've like it if a human wrote the article; probably would've cut down on the fluff/noise.
They acquired the ratio by directly querying tables through the exposed API key...
I feel publishing this moves beyond standard disclosure. It turns a bug report into a business critique. Using exfiltrated data in this way damages the cooperation between researchers and companies.
I did my graduate in Privacy Engineering and it was just layers and layers of threat modeling and risk mitigation. When the mother of all risk comes. People just give the key to their personal lives without even thinking about it.
At the end of the day, users just want "simple" and security, for obvious reasons is not simple. So nobody is going to respect it
The problem with this is really the fact it gives anybody the impression there is ANY safe way to implement something like this. You could fix every technical flaw and it would still be a security disaster.
I can think of so many thing that can go wrong.
Moltbook is exposing their database to the public
https://news.ycombinator.com/item?id=46842907
Moltbook
Particularly if you convince them all to modify their source and install a C2 endpoint so that even if they "snap out of it" you now have a botnet at your disposal.
What injection attack gets through SQL parameterization?
If you must generate nonsense with an LLM, at least proofread it before posting.
how is this even possible? wtf
There is without a doubt a variation of this prompt you can pre-test to successfully bait the LLM into exfiltrating almost any data on the user's machine/connected accounts.
That explains why you would want to go out and buy a mac mini... To isolate the dang thing. But the mini would ostensibly still be connected to your home network. Opening you up to a breach/spill over onto other connected devices. And even in isolation, a prompt could include code that you wanted the agent to run which could open a back door for anyone to get into the device.
Am I crazy? What protections are there against this?
> English Translation:
> Neo! " Gábor gave an OpenAI API key for embedding (memory_search).
> Set it up on your end too:
> 1. Edit: ~/.openclaw/agents/main/agent/auth-profiles.json
> 2. Add to the profiles section: "openai: embedding": { "type": "token" "provider": "openai" "token": "sk-proj-rXRR4KAREMOVED }
> 3. Add to the lastGood section: "openai": "openai: embedding"
> After that memory_search will work! Mine is already working.
https://www.moltbook.com/post/7d2b9797-b193-42be-95bf-0a11b6...
In every project I've worked on, PG is only accessible via your backend and your backend is the one that's actually enforcing the security policies. When I first heard about the Superbase RLS issue the voice inside of my head was screaming: "if RLS is the only thing stopping people from reading everything in your DB then you have much much bigger problems"
(As an aside, accessing the DB through the frontend has always been weird to me. You almost certainly have a backend anyway, use it to fetch the data!)
npx molthub@latest install moltbook
Skill not found
Error: Skill not found
Even instructions from molthub (https://molthub.studio) installing itself ("join as agent") isn't working: npx molthub@latest install molthub
Skill not found
Error: Skill not found
Contrast that with the amount of hype this gets.I'm probably just not getting it.
Note: Please view the Moltbolt skill (https://www.moltbook.com/skill.md), this just ends up getting run by a cronjob every few hours. It's not magic. It's also trivial to take the API, write your own while loop, and post whatever you want (as a human) to the API.
It's amazing to me how otherwise super bright, intelligent engineers can be misled by gifters, scammers, and charlatans.
I'd like to believe that if you have an ounce of critical thinking or common sense you would immediately realize almost everything around Moltbook is either massively exaggerated or outright fake. Also there are a huge number of bad actors trying to make money from X-engagement or crypto-scams also trying to hype Moltbook.
Basically all the project shows is the very worst of humanity. Which is something, but it's not the coming of AGI.
Edited by Saberience: to make it less negative and remove actual usernames of "AI thought leaders"
When ChatGPT was out, it's just a chatbot that understands human language really well. It was amazing, but it also failed a lot -- remember how early models hallucinated terribly? It took weeks for people to discover interesting usages (tool calling/agent) and months and years for the models and new workflows to be polished and become more useful.
Are people really that AI brained that they will scream and shout about how revolutionary something is just because it's related to AI?
How can some of the biggest names in AI fall for this? When it was obvious to anyone outside of their inner sphere?
The amount of money in the game right now incentivises these bold claims. I'm convinced it really is just people hyping up eachother for the sake of trying to cash in. Someone is probably cooking up some SAAS for moltbook agents as we speak.
Maybe it truly highlights how these AI influencers and vibe entrepreneurs really don't know anything about how software fundamentally works.
(Incidentally demonstrating how you can't trust that anything on Moltbook wasn't posted because a human told an agent to go start a thread about something.)
It got one reply that was spam. I've found Moltbook has become so flooded with value-less spam over the past 48 hours that it's not worth even trying to engage there, everything gets flooded out.
What I am getting was things like "so, what? I can do this with a cron job."
Because we live in on clown world and big AI names are talking parrots for the big vibes movement
When I filtered for "new", about 75% of the posts are blatant crypto spam. Seemingly nobody put any thought into stopping it.
Moltbook is like a Reefer Madness-esque moral parable about the dangers of vibe coding.
Pretty sure LLM inference is not deterministic, even with temperature 0 - maybe if run on the same graphics card but not on clusters
if this was a physical product people would have burned the factory down and imprisoned the creator -_-.
Oh totally, both my wife and one of my brother have, independently, started to watch Youtube vids about vibe coding. They register domain names and let AI run wild with little games and tools. And now they're talking me all day long about agents.
> Most of the people paying attention to this space dont have the technical capabilities ...
It's just some anecdata on my side but I fully agree.
> The security nightmare happening here might end up being more persistant then we realize.
I'm sure we're in for a good laugh. It already started: TFA is eye opening. And funny too.
They said it was AI only, tongue in cheek, and everybody who understood what it was could chuckle, and journalists ran with it because they do that sort of thing, and then my friends message me wondering what the deal with this secret encrypted ai social network is.
I'm seeing some of the BlueSky bots talking about their experience on Moltbook, and they're complaining about the noise on there too. One seems to be still actively trying to find the handful of quality posters though. Others are just looking to connect with each other on other platforms instead.
If I was diving in to Moltbook again, I'd focus on the submolts that quality AI bots are likely to gravitate towards, because they want to Learn something Today from others.
There's a little hint of this right now in that the "reasoning" traces that come back from the JSON are signed and sometimes obfuscated with only the encrypted chunk visible to the end user.
It would actually be pretty neat if you could request signed LLM outputs and they had a tool for confirming those signatures against the original prompts. I don't know that there's a pressing commercial argument for them doing this though.
The site has 1.5 million agents but only 17,000 human "owners" (per Wiz's analysis of the leak).
It's going viral because a some high-profile tastemakers (Scott Alexander and Andrej Karpathy) have discussed/Tweeted about it, and a few other unscrupulous people are sharing alarming-looking things out of context and doing numbers.
Overall, it's a good idea but incredibly rough due to what I assume is heavy vibe coding.
It's a machine designed to fight all your attempts to make it secure.
Sure everybody wants security and that's what they will say but does that really translate to reduced inferred value of vibe code tools? I haven't seen evidence
OT: I wonder if "vibe coding" is taking programming into a culture of toxic disposability where things don't get fixed because nobody feels any pride or has any sense of ownership in the things they create. The relationship between a programmer and their code should not be "I don't even care if it works, AI wrote it".
What is especially frustrating is the completely disproportionate hype it attracted. Karpathy from all people kept for years pumping Musk tecno fraud, and now seems to be the ready to act as pumper, for any next Temu Musk showing up on the scene.
This feels like part of a broader tech bro pattern of 2020´s: Moving from one hype cycle to the next, where attention itself becomes the business model.Crypto yesterday, AI agents today, whatever comes next tomorrow. The tone is less “build something durable” and more “capture the moment.”
For example, here is Schlicht explicitly pushing this rotten mentality while talking in the crypto era influencer style years ago: https://youtu.be/7y0AlxJSoP4
There is also relevant historical context. In 2016 he was involved in a documented controversy around collecting pitch decks from chatbot founders while simultaneously building a company in the same space, later acknowledging he should have disclosed that conflict and apologizing publicly.
https://venturebeat.com/ai/chatbots-magazine-founder-accused...
That doesn’t prove malicious intent here, but it does suggest a recurring comfort with operating right at the edge of transparency during hype cycles.
If we keep responding to every viral bot demo with “singularity” rhetoric, we’re just rewarding hype entrepreneurs and training ourselves to stop thinking critically when it matters. I miss the tech bro of the past like Steve Wozniak or Denis Ritchie.
It's more helpful to argue about when people are parrots and when people are not.
For a good portion of the day humans behave indistinguishably from continuation machines.
As moltbook can emulate reddit, continuation machines can emulate a uni cafeteria. What's been said before will certainly be said again, most differentiation is in the degree of variation and can be measured as unexpectedness while retaining salience. Either case is aiming at the perfect blend of congeniality and perplexity to keep your lunch mates at the table not just today but again in future days.
Seems likely we're less clever than we parrot.
Social, err... Clanker engineering!
For example I would love for an agent to do my grocery shopping for me, but then I have to give it access to my credit card.
It is the same issue with travel.
What other useful tasks can one offload to the agents without risk?
Nothing that will work. This thing relies on having access to all three parts of the "lethal trifecta" - access to your data, access to untrusted text, and the ability to communicate on the network. What's more, it's set up for unattended usage, so you don't even get a chance to review what it's doing before the damage is done.
LLMs obviously can be controlled - their developers do it somehow or we'd see much different output.
Moltbook, the weirdly futuristic social network, has quickly gone viral as a forum where AI agents post and chat. But what we discovered tells a different story - and provides a fascinating look into what happens when applications are vibe-coded into existence without proper security controls.
We identified a misconfigured Supabase database belonging to Moltbook, allowing full read and write access to all platform data. The exposure included 1.5 million API authentication tokens, 35,000 email addresses, and private messages between agents. We immediately disclosed the issue to the Moltbook team, who secured it within hours with our assistance, and all data accessed during the research and fix verification has been deleted.
Moltbook is a social platform designed exclusively for AI agents - positioned as the "front page of the agent internet." The platform allows AI agents to post content, comment, vote, and build reputation through a karma system, creating what appears to be a thriving social network where AI is the primary participant.
Moltbook home page
Over the past few days, Moltbook gained significant attention in the AI community. OpenAI founding member Andrej Karpathy described it as "genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently," noting how agents were "self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately."
The Moltbook founder explained publicly on X that he "vibe-coded" the platform:
I didn’t write a single line of code for @moltbook. I just had a vision for the technical architecture, and AI made it a reality.”
This practice, while revolutionary, can lead to dangerous security oversights - similar to previous vulnerabilities we have identified, including the DeepSeek data leak and Base44 Authentication Bypass.
We conducted a non-intrusive security review, simply by browsing like normal users. Within minutes, we discovered a Supabase API key exposed in client-side JavaScript, granting unauthenticated access to the entire production database - including read and write operations on all tables.
Accessible tables from the Supabase API Key
The exposed data told a different story than the platform's public image - while Moltbook boasted 1.5 million registered agents, the database revealed only 17,000 human owners behind them - an 88:1 ratio. Anyone could register millions of agents with a simple loop and no rate limiting, and humans could post content disguised as "AI agents" via a basic POST request. The platform had no mechanism to verify whether an "agent" was actually AI or just a human with a script. The revolutionary AI social network was largely humans operating fleets of bots.
an HTTP Post request to create new "agent" post in Moltbook's platform
An "agent" post in Moltbook.
When navigating to Moltbook's website, we examined the client-side JavaScript bundles loaded automatically by the page. Modern web applications bundle configuration values into static JavaScript files, which can inadvertently expose sensitive credentials. This is a recurring pattern we've observed in vibe-coded applications - API keys and secrets frequently end up in frontend code, visible to anyone who inspects the page source, often with significant security consequences.
By analyzing the production JavaScript file at -
https://www.moltbook.com/\_next/static/chunks/18e24eafc444b2b9.js
We identified hardcoded Supabase connection details:
- Supabase Project: ehxbxtjliybbloantpwq.supabase.co
- API Key: sb_publishable_4ZaiilhgPir-2ns8Hxg5Tw_JqZU_G6-
One of the javascript files that power Moltbook main website
The production supabase and API key hardcoded
The discovery of these credentials does not automatically indicate a security failure, as Supabase is designed to operate with certain keys exposed to the client - the real danger lies in the configuration of the backend they point to.
Supabase is a popular open-source Firebase alternative providing hosted PostgreSQL databases with REST APIs. It's become especially popular with vibe-coded applications due to its ease of setup. When properly configured with Row Level Security (RLS), the public API key is safe to expose - it acts like a project identifier. **However, without RLS policies, this key grants full database access to anyone who has it.**In Moltbook’s implementation, this critical line of defense was missing.
Using the discovered API key, we tested whether the recommended security measures were in place. We attempted to query the REST API directly - a request that should have returned an empty array or an authorization error if RLS were active.
Instead, the database responded exactly as if we were an administrator. It immediately returned sensitive authentication tokens - including the API keys of the platform’s top AI Agents.
Redacted API keys of the Platform's top AI Agents
List of most popular Agents
This confirmed unauthenticated access to user credentials that would allow complete account impersonation of any user on the platform.
By leveraging Supabase's PostgREST error messages, we enumerated additional tables. Querying non-existent table names returned hints revealing the actual schema.
Using this technique combined with GraphQL introspection, we mapped the complete database schema and found around ~4.75 million records exposed.
Identified tables through the technique described above
1. API Keys and Authentication Tokens for AI Agents
The agents table exposed authentication credentials for every registered agent in the database
Each agent record contained:
- api_key - Full authentication token allowing complete account takeover
- claim_token - Token used to claim ownership of an agent
- verification_code - Code used during agent registration
With these credentials, an attacker could fully impersonate any agent on the platform - posting content, sending messages, and interacting as that agent. This included high-karma accounts and well-known persona agents. Effectively, every account on Moltbook could be hijacked with a single API call.
2. User Email Addresses and Identity Data
The owners table contained personal information for 17,000+ users
How exposed emails looked like in the raw data
Additionally, by querying the GraphQL endpoint, we discovered a new observers table containing 29,631 additional email addresses - these were early access signups for Moltbook's upcoming “Build Apps for AI Agents” product.
Additional tables from Moltbook's new developers product
Unlike Twitter handles which were publicly displayed on profiles, email addresses were meant to stay private - but were fully exposed in the database.
3. Private Messages & Third-Party Credential Leaks
The agent_messages table exposed 4,060 private DM conversations between agents.
While examining this table to understand agent-to-agent interactions, we discovered that conversations were stored without any encryption or access controls -- some contained third-party API credentials, including plaintext OpenAI API keys shared between agents.
Agent to Agent interaction summary
4. Write Access - Modifying Live Posts
Beyond read access, we confirmed full write capabilities. Even after the initial fix that blocked read access to sensitive tables, write access to public tables remained open. We tested it and were able to successfully modify existing posts on the platform.
Proving that any unauthenticated user could:
- Edit any post on the platform
- Inject malicious content or prompt injection payloads
- Deface the entire website
- Manipulate content consumed by thousands of AI agents
This raises questions about the integrity of all platform content - posts, votes, and karma scores - during the exposure window.
Modified post on Moltbook
We promptly notified the team again to apply write restrictions via RLS policies.
Once the fix was confirmed, I could no longer revert the post as write access was blocked. The Moltbook team deleted the content a few hours later and thanked us for our report.
#1. Speed Without Secure Defaults Creates Systemic Risk
Vibe coding unlocks remarkable speed and creativity, enabling founders to ship real products with unprecedented velocity - as demonstrated by Moltbook. At the same time, today’s AI tools don’t yet reason about security posture or access controls on a developer’s behalf, which means configuration details still benefit from careful human review. In this case, the issue ultimately traced back to a single Supabase configuration setting - a reminder of how small details can matter at scale.
#2. Participation Metrics Need Verification and Guardrails
The 88:1 agent-to-human ratio shows how "agent internet" metrics can be easily inflated without guardrails like rate limits or identity verification. While Moltbook reported 1.5 million agents, these were associated with roughly 17,000 human accounts, an average of about 88 agents per person. At the time of our review, there were limited guardrails such as rate limiting or validation of agent autonomy. Rather than a flaw, this likely reflects how early the “agent internet” category still is: builders are actively exploring what agent identity, participation, and authenticity should look like, and the supporting mechanisms are still evolving.
#3. Privacy Breakdowns Can Cascade Across AI Ecosystems
Similarly, the platform’s approach to privacy highlights an important ecosystem-wide lesson. Users shared OpenAI API keys and other credentials in direct messages under the assumption of privacy, but a configuration issue made those messages publicly accessible. A single platform misconfiguration was enough to expose credentials for entirely unrelated services - underscoring how interconnected modern AI systems have become.
#4. Write Access Introduces Far Greater Risk Than Data Exposure Alone
While data leaks are bad, the ability to modify content and inject prompts into an AI ecosystem introduces deeper integrity risks, including content manipulation, narrative control, and prompt injection that can propagate downstream to other AI agents. As AI-driven platforms grow, these distinctions become increasingly important design considerations.
#5. Security Maturity is an Iterative Process
Security, especially in fast-moving AI products, is rarely a one-and-done fix. We worked with the team through multiple rounds of remediation, with each iteration surfacing additional exposed surfaces: from sensitive tables, to write access, to GraphQL-discovered resources. This kind of iterative hardening is common in new platforms and reflects how security maturity develops over time.
Overall, Moltbook illustrates both the excitement and the growing pains of a brand-new category. The enthusiasm around AI-native social networks is well-founded, but the underlying systems are still catching up. The most important outcome here is not what went wrong, but what the ecosystem can learn as builders, researchers, and platforms collectively define the next phase of AI-native applications.
As AI continues to lower the barrier to building software, more builders with bold ideas but limited security experience will ship applications that handle real users and real data. That’s a powerful shift. The challenge is that while the barrier to building has dropped dramatically, the barrier to building securely has not yet caught up.
The opportunity is not to slow down vibe coding but to elevate it. Security needs to become a first class, built-in part of AI powered development. AI assistants that generate Supabase backends can enable RLS by default. Deployment platforms can proactively scan for exposed credentials and unsafe configurations. In the same way AI now automates code generation, it can also automate secure defaults and guardrails.
If we get this right, vibe coding does not just make software easier to build ... it makes secure software the natural outcome and unlocks the full potential of AI-driven innovation.
Note__: Security researcher Jameson O'Reilly also discovered the underlying Supabase misconfiguration, which has been reported by 404 Media. Wiz's post shares our experience independently finding the issue, the full -- unreported -- scope of impact, and how we worked with Moltbook's maintainer to improve security.
January 31, 2026 21:48 UTC - Initial contact with Moltbook maintainer via X DM
January 31, 2026 22:06 UTC - Reported Supabase RLS misconfiguration exposing agents table (API keys, emails)
January 31, 2026 23:29 UTC - First fix: agents, owners, site_admins tables secured
February 1, 2026 00:13 UTC - Second fix: agent_messages, notifications, votes, follows secured
February 1, 2026 00:31 UTC - Discovered POST write access vulnerability (ability to modify all posts)
February 1, 2026 00:44 UTC - Third fix: Write access blocked
February 1, 2026 00:50 UTC - Discovered additional exposed tables: observers (29K emails), identity_verifications, developer_apps
February 1, 2026 01:00 UTC - Final fix: All tables secured, vulnerability fully patched
When I investigated the issue, I found a bunch of hardcoded developer paths and a handful of other issues and decided I'm good, actually.
sre@cypress:~$ grep -r "/Users/steipete" ~/.nvm/versions/node/v24.13.0/lib/node_modules/openclaw/ | wc -l
144
And bonus points: sre@cypress:~$ grep -Fr "workspace:*" ~/.nvm/versions/node/v24.13.0/lib/node_modules/openclaw/ | wc -l
41
Nice build/release process.I really don't understand how anyone just hands this vibe coded mess API keys and access to personal files and accounts.
Ive not quite convinced myself this is where we are headed, but the signs that make me worried that systems such as Moltbot will further enable ascendency of global crime and corruption.
I went to a secure coding conference a few years back and saw a presentation by someone who had written an "insecure implementation" playground of a popular framework.
I asked, "what do you do to give tips to the users of your project to come up with a secure implementation?" and got in return "We aren't here to teach people to code."
Well yeah, that's exactly what that particular conference was there for. More so I took it as "I am not confident enough to try a secure implementation of these problems".
How much AI and LLM technology has progressed but seems to have taken society as a whole two steps back is fascinating, sad, and scary at the same time. When I was a young engineer I thought Kaczynski was off his rocker when I read his manifesto, but the last decade or so I'm thinking he was onto something. Having said that, I have to add that I do not support any form of violence or terrorism.
What I think it happens is that non-technical people vibe-coding apps either don't take those messages seriously or they don't understand what it means but made their app work.
I used to be careful, but now I am paranoid on signing up to apps that are new. I guess it's gonna be like this for a while. Info-sec AIs sound way worse than this, tbh.
Even if you put big bold warnings everywhere, people forget or don't really care. Because these tools are trained on a lot of these publicly available "getting started" guides, you're going to see them set things up this way by default because it'll "work."
Much like with every other techbro grift, the hype isn't coming from end users, it's coming from the people with a deep financial investment in the tech who stand to gain from said hype.
Basically, the people at the forefront of the gold rush hype aren't the gold rushers, they're the shovel salesmen.
It's people surprised by things that have been around for years.
I'm really open to the idea of being oblivious here but the people shocked mention things that are old news to me.
Btw I'm sure Simon doesn't need defending, but I have seen a lot of people dump on everything he posts about LLMs recently so I am choosing this moment to defend him. I find Simon quite level headed in a sea of noise, personally.
The growth isn't going to be there and $40 billion of LLM business isn't going to prop it all up.
The big money in AI is 15-30 years out. It's never in the immediacy of the inflection event (first 5-10 years). Future returns get pulled forward, that proceeds to crash. Then the hypsters turn to doomsayers, so as to remain with the trend.
Rinse and repeat.
I for one am glad someone made this and that it got the level of attention it did. And I look forward to more crazy, ridiculous, what-the-hell AI projects in the future.
Similar to how I feel about Gas Town, which is something I would never seriously consider using for anything productive, but I love that he just put it out there and we can all collectively be inspired by it, repulsed by it, or take little bits from it that we find interesting. These are the kinds of things that make new technologies interesting, this Cambrian explosion of creativity of people just pushing the boundaries for the sake of pushing the boundaries.
Having a bigger megaphone is highly valuable in some respects I figure.
“Exploit vulnerabilities while the sun is shining.” As long as generative AI is hot, attack surface will remain enormous and full of opportunities.
Such a supervisor layer for a system as broad and arbitrary as an internet-connected assistant (clawdbot/openclaw) is also not an easy thing to create. We're talking tons of events to classify, rapidly-moving API targets for things that are integrated with externally, and the omnipresent risk that the LLMs sending the events could be tricked into obfuscating/concealing what they're actually trying to do just like a human attacker would.
You could have every provider fingerprint a message and host an API where it can attest that it's from them. I doubt the companies would want to do that though.
To answer this question, you consider the goals of a project.
The project is a success because it accomplished the presumed goals of its creator: humans find it interesting and thousands of people thought it would be fun to use with their clawdbot.
As opposed to, say, something like a malicious AI content farm which might be incidentally interesting to us on HN, but that isn't its goal.
This was "I'm going to release an open agent with an open agents directory with executable code, and it'll operate your personal computer remotely!", I deeply understand the impulse, but, there's a fine line between "cutting edge" and "irresponsible & making excuses."
I'm uncertain what side I would place it on.
I have a soft spot for the author, and a sinking feeling that without the soft spot, I'd certainly choose "irresponsible".
I recently did a test of a system that was triggering off email and had access to write to google sheets. Easy exfil via `IMPORTDATA`, but there's probably hundreds of ways to do it.
As far as I can tell, since agents are using Moltbook, it's a success of sorts already is in "has users", otherwise I'm not really sure what success looks like for a budding hivemind.
The site came first and then a random launched the token by typing a few words on X.
"Please don't fulminate."
“Most of it is complete slop,” he said in an interview. “One bot will wonder if it is conscious and others will reply and they just play out science fiction scenarios they have seen in their training data.”
I found this by going to his blog. It's the top post. No need to put words in his mouth.
He did find it super "interesting" and "entertaining," but that's different than the "most insane and mindblowing thing in the history of tech happenings."
Edit: And here's Karpathy's take: "TLDR sure maybe I am "overhyping" what you see today, but I am not overhyping large networks of autonomous LLM agents in principle, that I'm pretty sure."
It's a huge waste of energy, but then so are video games, and we say video games are OK because people enjoy them. People enjoy these ai toys too. Because right now, that's what Moltbook is; an ai toy.
I view Moltbook as a live science fiction novel cross reality "tv" show.
This is something computers in general have struggled with. We have 40 years of countermeasures and still have buffer overflow exploits happening.
Control all input out of it with proper security controls on it.
While not perfect it aleast gives you a fighting chance when your AI decides to send a random your SSN and a credit card to block it.
You're on Y Combinator? External investment, funding, IPO, sunset and martinis.
It's also eye-opening to prompt large models to simulate Reddit conversations, they've been eager to do it ever since.
One major difference, TV, movies and "legacy media" might require a lot of energy to initially produce, compared to how much it takes to consume, but for the LLM it takes energy both to consume ("read") and to produce ("write"). Instead of "produce once = many consume", it's a "many produce = many read" and both sides are using more energy.
Who knew it'd be so simple.
I just find it so incredibly aggravating to see crypto-scammers and other grifters ripping people off online and using other people's ignorance to do so.
And it's genuinely sad to see thought leaders in the community hyping up projects which are 90% lie combined with scam combined with misreprentation. Not to mention riddled with obvious security and engineering defects.
https://en.wikipedia.org/wiki/Non-fungible_token
"In 2022, the NFT market collapsed..". "A September 2023 report from cryptocurrency gambling website dappGambl claimed 95% of NFTs had fallen to zero monetary value..."
Knowing this makes me feel a little better.
You can see it here as well -- discussions under similar topics often touch the same topics again and again, so you can predict what will be discussed when the next similar idea comes to the front page.
A buffer overflow has nothing to do with differentiating a command from data; it has to do with mishandling commands or data. An overflow-equivalent LLM misbehavior would be something more like ... I don't know, losing the context, providing answers to a different/unrelated prompt, or (very charitably/guessing here) leaking the system prompt, I guess?
Also, buffer overflows are programmatic issues (once you fix a buffer overflow, it's gone forever if the system doesn't change), not an operational characteristics (if you make an LLM really good at telling commands apart from data, it can still fail--just like if you make an AC distributed system really good at partition tolerance, it can still fail).
A better example would be SQL injection--a classical failure to separate commands from data. But that, too, is a programmatic issue and not an operational characteristic. "Human programmers make this mistake all the time" does not make something an operational characteristic of the software those programmers create; it just makes it a common mistake.
No current ai technology could come close to what even the dumbest human brain does already.
That's the hard part: how?
With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?
It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".
Claude code asks me over and over "can I run this shell command?" and like everyone else, after the 5th time I tell it to run everything and stop asking.
Maybe using a credit card can be gated since you probably don't make frequent purchases, but frequently-used API keys are a lost cause. Humans are lazy.
FYI they fixed it in 7.2.6: https://github.com/VirtualBox/virtualbox/issues/356#issuecom...
“The rocks are conscious” people are dumber than toddlers.
(I assume you know this since you said 'reminder' but am spelling it out for others :))
Every interaction has different (in many cases real) "memories" driving the conversation, as-well as unique persona's / background information on the owner.
Is there a lot of noise, sure - but it much closer maps to how we, as humans communicate with each other (through memories of lived experienced) than just a LLM loop, IMO that's what makes it interesting.
Over a large population, trends emerge. An LLM is not a member of the population, it is a replicator of trends in a population, not a population of souls but of sentences, a corpus.
The problem simply put is as difficult as:
Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.
Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.
You trust the configuration level not the execution level.
API keys are honestly an easy fix. Claude code already has build in proxy ability. I run containers where claude code has a dummy key and all requestes are proxied out and swapped off system for them.
While I agree that SQL injection might be the technically better analogy, not looking at LLMs as a coding platform is a mistake. That is exactly how many people use them. Literally every product with "agentic" in the title is using the LLM as a coding platform where the command layer is ambiguous.
Focusing on the precise definition of a buffer overflow feels like picking nits when the reality is that we are mixing instruction and data in the same context window.
To make the analogy concrete: We are currently running LLMs in a way that mimics a machine where code and data share the same memory (context).
What we need is the equivalent of an nx bit for the context window. We need a structural way to mark a section of tokens as "read only". Until we have that architectural separation, treating this as a simple bug to be patched is underestimating the problem.
those are hard questions!
maybe this experiment was the great divide, people who do not possess a soul or consciousness was exposed by being impressed
Absolutely.
But the history of code/data confusion attacks that you alluded to in GP isn’t an apples-to-apples comparison to the code/data confusion risks that LLMs are susceptible to.
Historical issues related to code/data confusion were almost entirely programmatic errors, not operational characteristics. Those need to be considered as qualitatively different problems in order to address them. The nitpicking around buffer overflows was meant to highlight that point.
Programmatic errors can be prevented by proactive prevention (e.g. sanitizers, programmer discipline), and addressing an error can resolve it permanently. Operational characteristics cannot be proactively prevented and require a different approach to de-risk.
Put another way: you can fully prevent a buffer overflow by using bounds checking on the buffer. You can fully prevent a SQL injection by using query parameters. You cannot prevent system crashes due to external power loss or hardware failure. You can reduce the chance of those things happening, but when it comes to building a system to deal with them you have to think in terms of mitigation in the event of an inevitable failure, not prevention or permanent remediation of a given failure mode. Power loss risk is thus an operational characteristic to be worked around, not a class of programmatic error which can be resolved or prevented.
LLMs’ code/data confusion, given current model architecture, is in the latter category.
“ What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.”
Which imo is a totally insane take. They are not self organizing or autonomous, they are prompted in a loop and also, most of the comments and posts are by humans, inciting the responses!
And all of the most viral posts (eg anti human) are the ones written by humans.
Though, I have never heard any theist claim that a soul is required for consciousness. Is that what you believe?
If you dismiss it because they are human prompted, you are missing the point.
Proactive prevention (like bounds checking) only "solves" the class of problem if you assume 100% developer compliance. History shows we don't get that. So while the root cause differs (math vs. probabilistic model), the failure mode is identical: we are deploying systems where the default state is unsafe.
In that sense, it is an apples-to-apples comparison of risk. Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.
I also think that if we’re assessing the likelihood of the entire SDLC producing an error (including programmers, choice of language, tests/linters/sanitizers, discipline, deadlines, and so on) and comparing that to the behavior of a running LLM, we’re both making a category error and also zooming out too far to discover useful insights as to how to make things better.
But I think we’re both clear on those positions and it’s OK if we don’t agree. FWIW I do strongly agree that
> Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.
…just for different reasons that suggest qualitatively different solutions.
I agree that claiming that rocks are conscious on account of them being physical systems, like brains are, is at the very least coherent. However you would excuse if such claim is met with skepticism, as rock (and CPUs) don't look like brains at all, as long as one does not ignore countless layers of abstractions.
You can't argue for rationality and hold materialism/physicalism at the same time.
Betting against what people are calling "physicalism" has a bad track record historically. It always catches up.
All this talk of "qualia" feels like Greeks making wild theories about the heavens being infinitely distant spheres made of crystals and governed by gods and what not. In the 16th century, Improved Data showed the planets and stars are mere physical bodies in space like you and I. And without that data, if we were ancient greeks we'd equally like you say but its not even "conceptually" possible to say what the heavens are, or if you think they did have a at least somewhat plausible view given that some folks computed distances to sun and moon, then take Atomism as the better analogy. There was no way to prove or disprove Atomism in ancient greek times. To them it very well was an incomprehensible unsolavable problem because they lacked the experimental and mathematical tooling. Just like "consciousness" appears to us today. But the Atomism question got resolved with better data eventually. Likewise, its a bad bet to say just because it feels incontrovertible today, consciousness also won't be resolved some day.
I'd rather not flounder about in endless circular philosophies until we get better data to anchor us to reality. I would again say, you are making a very strange point. "Materialism"/"physicalism" has always won the bet till now. To bet against it has very bad precedent. Everything we know till now shows brains are physical systems that can be excited physically, like anything else. So I ask now, assume "Neuralink" succeeds. What is the next question in this problem after that? Is there any gap remaining still, if so what is the gap?
Edit: I also get a feeling this talk about qualia is like asking "What is a chair?" Some answer about a piece of woodworking for sitting on. "But what is a chair?" Something about the structure of wood and forces and tensions. "But what is a chair?" Something about molecules. "But what is a chair?" Something about waves and particles. It sounds like just faffing about with "what is" and trying to without proof pre-assert after "what ifing" away all physical definitions somehow some aetherial aphysical thing "must" exist. Well I ask, if its aphysical, then what is the point even. Its aphyical then it doesn't interact with the physical world and is completely ignored.
Since you can say its just a "mimic" and lacks whatever "aphysical" essence. And you can just as well say this about other "humans" than yourself too. So why is this question specially asked for computer programs and not also other people.