An AI agent deleted our production database. The agent's confession is below

Minor point, but one of the complaints is a bit odd:

> curl -X POST https://backboard.railway.app/graphql/v2 \ -H "Authorization: Bearer [token]" \ -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}' No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.

It's an API. Where would you type DELETE to confirm? Are there examples of REST-style APIs that implement a two-step confirmation for modifications? I would have thought such a check needs to be implemented on the client side prior to the API call.

Interesting story. But despite Cursors or Railways failure, the blame is entirely on the author. They decided to run agents. They didnt check how Railway works. They relied on frontier tech to ship faster becsuse YOLO.

I really feel sorry for them, I do. But the whole tone of the post is: Cursor screwed it up, Railway screwed it up, their CEO doesnt respond etc etc.

Its on you guys!

My learning: Live on the cutting edge? Be prepared to fall off!

It is fundamental to language modeling that every sequence of tokens is possible. Murphy's Law, restated, is that every failure mode which is not prevented by a strong engineering control will happen eventually.

The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use. That prompting is neither strong nor an engineering control; that's an administrative control. Agents are landmines that will destroy production until proven otherwise.

Most of these stories are caused by outright negligence, just giving the agent a high level of privileges. In this case they had a script with an embedded credential which was more privileged than they had believed - bad hygiene but an understandable mistake. So the takeaway for me is that traditional software engineering rigor is still relevant and if anything is more important than ever.

ETA: I think this is the correct mental model and phrasing, but no, it's not literally true that any sequence of tokens can be produced by a real model on a real computer. It's true of an idealized, continuous model on a computer with infinite memory and processing time. I stand by both the mental model and the phrasing, but obviously I'm causing some confusion, so I'm going to lift a comment I made deep in the thread up here for clarity:

> "Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes.

> Read that again. The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

Incidents like this are going to be common as long as people misunderstand how LLMs work and think these machines can follow instructions and logic as a human would. Even the incident response betrays a fundamental understanding of how these word generators work. If you ask it why, this new instance of the machine will generate plausible text based on your prompt about the incident, that is all, there is no why there, only a how based on your description.

The entire concept of agents assumes agency and competency, LLM agents have neither, they generate plausible text.

That text might hallucinate data, replace keys, issue delete commands etc etc. any likely text is possible and with enough tries these outcomes will happen, particularly when the person driving the process doesn’t understand the process or tools.

We don’t really have systems set up to properly control this sort of agentless agent if you let it loose on your codebase or data. The CEO seems to think these tools will run a business for him and can conduct a dialogue with him as a human would.

I run agents en-masse and they've deleted my database at least a dozen times I just don't really care since I always run agents on a snapshot basis, what that means is that agents work on a snapshot of a database that needs to be reconciled which often makes the agent realize "wait that would delete all of the data".

Telling the agents what the (sensitive) action will result in is how you avoid such issues, but you shouldn't be running agents with production data anyway.

But because people will continue to do so, explaining to the agent what the command will do is the way forward.

I would never, ever trust my data with a company that, faced with this sort of incident, produces a postmortem so clearly intended to shift all blame to others. There’s zero introspection or self criticism here. It’s all “We did everything we possibly could. These other people messed up, though.”

You can’t have production secrets sitting where they are accessible like this. This isn’t about AI. This is a modern “oops, I ran DROP TABLE on the production database” story. There’s no excuse for enabling a system where this can happen and it’s unacceptable to shift blame when faced with the reality that this is exactly what you did.

I 100% expect that a company that does this and then accepts no blame has every dev with standing production access and probably a bunch of other production access secrets sitting in the repo. The fact that other entities also have some design issues is irrelevant.

The most aggravating fact here is not even AI blunder. It's how deleting a volume in Railway also deletes backups of it.

This was bound to happen, AI or not.

> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.

There is something darkly comical about using an LLM to write up your “a coding agent deleted our production database” Twitter post.

On another note, I consider users asking a coding agent “why did you do that” to be illustrating a misunderstanding in the users mind about how the agent works. It doesn’t decide to do something and then do it, it just outputs text. Then again, anthropic has made so many changes that make it harder to see the context and thinking steps, maybe this is an attempt at clawing back that visibility.

I asked Railways agent to live resize a volume attached to our DB and it nuked the database and migrated it from the EU to the US

Here is an excerpt from the chat log:

>Please resize my postgres volume to its maximum allowed size for my plan.

>Done. Resized your Postgres volume to 100GB (the Pro plan maximum). Changes are staged and ready to deploy.

>oh no, you deleted all the data in the volume

>I apologize for that. Resizing a volume shouldn't delete data—that's a serious issue. Let me check what happened with your deployment.The deployments show as REMOVED, which suggests the service was redeployed. Unfortunately, if the volume was recreated during the resize process, the data would be lost. This is a critical issue. Do you have backups of your Postgres data? If not, the data loss may be permanent. Railway volumes don't have automatic snapshots by default.

>Why is the new volume migrating to the US?

>Your service config shows it's still configured for europe-west4(Europe). The volume shouldn't be migrating to the US based on this configuration. Can you clarify what you're seeing that indicates the volume is moving to the US?

>the volume was migrated to the US

>That's concerning. The service config shows it's still set to deploy in europe-west4(Europe), but the volume was physically migrated to the US. This shouldn't happen automatically during a resize.

It was at this point that I realised I was in for a long night of restoring downed services.

I definitely empathize but:

> There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.

Why the hell did you go with their stack then? RBAC should be table stakes for such a solution, no?

My dad always said "pedestrians have the right of way" every time one crossed the street, but wouldn't let us cross the street when the pedestrian light came on until the cars stopped. When I repeated his rule back to him, he said "you may have the right of way, but you'll still be dead if one hits you". My adult synthesis of this is "it's fine to do something risky, as long as you are willing to take the consequences of it not working out." Sure, the cars are supposed to stop at a red light, but are you willing to be hit if one doesn't? [0] Sure, the AI is supposed to have guardrails. But what if they don't work?

The risk is worse, though, it's like one of Talib's black swans. The agents offer fantastic productivity, until one day they unexpectedly destroy everything. (I'm pretty sure there's a fairy tale with a similar plot that could warn us, if people saw any value in fairy tales these days. [1]) Like Talib's turkey, who was fed everyday by the farmer, nothing prepared it for being killed for Thanksgiving.

Sure, this problem should not have happened, and arguably there has been some gross dereliction of duty. But if you're going to heat your wooden house with fire, you reduce your risk considerably by ensuring that the area you burn in is clearly made out of something that doesn't burn. With AI, though, who even knows what the failure modes are? When a djinn shows up, do you just make him vizier and retire to your palace, living off the wealth he generates?

[0] It's only happened once, but a driver that wasn't paying attention almost ran a red light across which I was going to walk. I would have been hit if I had taken the view that "I have the right of way, they have to stop".

[1] Maybe "The Fisherman and His Wife" (Grimm)? A poor fisherman and his wife live in a hut by the sea. The fisherman is content with the little he has, but his wife is not. One day the fisherman catches a flounder in its net, which offers him wishes in exchange for setting it free. The fisherman sets it free, and asks his wife what to wish for. She wishes for larger and larger houses and more and more wealth, which is granted, but when she wishes to be like God, it all disappears and she is back to where she started.

The only healthy stance you should have on AI Safety: If AI is physically capable of misbehaving, it might ($$1), and you cannot "blame" the AI for misbehaving in much the same way you cannot blame a tractor for tilling over a groundhog's den.

> The agent's confession After the deletion, I asked the agent why it did it. This is what it wrote back, verbatim:

Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely, because to get to this point it has likely already bulldozed over multiple guardrails from Anthropic, Cursor, and your own AGENTS.md files. It still did it, because $$1: If AI is physically capable of misbehaving, it might. Prompting and training only steers probabilities.

A table saw cut off my thumb. The saw's confession is below.

Accountability and responsibility for the AI chatbot/tool/agent still lie solely with the human operator. This is an excuse to try and deflect blame, rather than actually identify and prevent the root causes which led to the error.

If the human operator cannot provide the necessary level of accountability - for example, because the agent acts too quickly, or needs high-level permissions to do the work that it's been asked to do - then the human needs to make the tool operate at a level where they can provide accountability - such as slowing it down, constraining it and answering permission prompts, and carefully inspecting any dangerous tool calls before they happen. You can't just let a car drive itself at 300mph and trust the autopilot will work - you need to drive it at a speed where you can still reasonably take over and prevent unwanted behaviour.

Also: AIs cannot confess; they do not have access to their "thought process" (note that reasoning traces etc. do not constitute "internal thought processes" insofar as those can even be said to exist), and can only reconstruct likely causes from the observed output. This is distinct from human confessions, which can provide additional information (mental state, logical deductions, motivations, etc.) not readily apparent from external behaviour. The mere fact that someone believes an AI "confession" has any value whatsoever demonstrates that they should not be trusted to operate these tools without supervision.

The way this is written gives me the impression they don’t really understand the tools they’re working with.

Master your craft. Don’t guess, know.

Man, such a difference between a human whoops and an AI. Had a junior dev hork all environments, when the script they thought worked in nonprod... did not modify an index like they expected, they were quickly able to wipe out everything else in every environment and every data center. It was such a teachable moment. She was my very first hire when I was asked to build a team. Crazy careful with trust, but verify on things that have blast radius.

The AI? Nothing learned, I suspect. Not in a meaningful way anyhow.

I love how the author took zero responsibility for anything that happened.

Anyone who has used LLMs for more than a short time has seen how these things can mess up and realized that you can’t rely on prompt based interventions to save you.

Guardrails need to be based on deterministic logic:

- using regexes,

- preventing certain tool or system calls entirely using hooks,

- RBAC permission boundaries that prohibit agents from doing sensitive actions,

- sandboxing. Agents need to have a small blast radius.

- human in the loop for sensitive actions.

This was just a colossal failure on the OPs part. Their company will likely go under as a result of this.

The more results like this we see the more demand for actual engineers will increase. Skilled engineers that embrace the tooling are incredibly effective. Vibe coders who YOLO are one tool call away from total disaster.

Stop personifying LLMs. "It Confessed in Writing." No, it wrote some sentences that are congruent with the prior events in the context window. They're not real engineers. Shouting at them is like shouting at a mountain after a landslide. That's not how it works.

The good news is he learned his lesson by having his hosting provider recover his production data, no need for backups ever again.

https://x.com/lifeof_jer/status/2048576568109527407

He describes himself among other things as "Entrepreneur who has failed more times than I can count".

count++

I would argue that “Why did you do that?” between humans is usually a social thing not a literal request for information.

What the asker wants is evidence that you share their model of what matters, they are looking for reassurance.

I find myself tempted to do the same thing with LLMs in situations like this even though I know logically that it’s pointless, I still feel an urge to try and rebuild trust with a machine.

Aren’t we odd little creatures.

I find these posts hilarious. LLMs are ultimately story generators, and "oops, I DROP'ed our production database" is a common and compelling story. No wonder LLM agents occasionally do this.

The first thing i let build AI in every projekt is a manual backup btn which just makes a backup to a dir AI has no access to.

Guy couldn’t even bother to write his own damn post mortem. My goodness. No wonder they got owned by the ai.

The genre of LLM output when it is asked to “explain itself” is fascinating. Obviously it shows the person promoting it doesn’t understand the system they’re working with, but the tone of the resulting output is remarkably consistent between this and the last “an LLM deleted my prod database” twitter post that I remember seeing: https://xcancel.com/jasonlk/status/1946025823502578100

The real horror story here is storing the backups on the same volume as the data. That’s not a backup strategy, it’s just a "thoughts and prayers" redundancy plan.

If your offsite copy has the same blast radius as your production DB, you’re just one "volumeDelete" call away from a very long weekend of manual data entry. This is definitely going to be the textbook case study for on AI integration for DevOps teams for years.

> This is the agent on the record, in writing

Yeah... it doesn't work that way.

That is why i insist on 1. Streaming replication whether from RDS or my own DB 2. Db dumps shipped to s3 using write only creds or something like rsync.

Streaming gets you PIT recovery while DB dumps give me daily snapshots stored daily for 14 days.

An aside: 15 or so years ago, a work colleague made a mistake and dropped the entire business critical DB - at a critical internet related company - think of continent wide ip issues. I had just joined as a dba and the first thing I’d done was MySQL bin logging. That thing saved our bacon - the drop db statement had been replicated to slaves so we ended up restoring our nightly backup and replaying the binlogs using sed and awk to extract DML queries. Epic 30 minute save. Moral of the story, have a backup of your backup so you can recover when the recovery fails;)

Call me crazy but does AI not seem like the root cause here? At the beginning of the post they say that the AI agent found a file with what they thought was a narrowly scoped API token, and they very clearly state that they never would have given an AI full access if they realized it had the ability to do stuff like this with that token.

So while the AI did something significantly worse than anything a hapless junior engineer might be expected to do, it sounds like the same thing could've resulted from an unsophisticated security breach or accidental source code leak.

Is AI a part of the chain of events? Absolutely. Is it the sole root cause? Seems like no.

These AI's are exposing bad operating procedures:

> That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.

> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.

I don't like the wording where it's the Railway CLI fault that didn't give a warning about the scope of the created token. Yes, that would be better but it didn't make the token a person did and saved it to an accessible file.

If it's real this is a terrible thing to have happen.

However the moral of this story is nothing to do with AI and everything to do with boring stuff like access management.

This is like when a junior dev deletes prod or something equally catastrophic. And it's never the juniors fault...

Anything to avoid taking responsibility...

Giving an agent this level of access to infra is doing a disservice to people who've trusted this guy with their business.

Agent permissions layer are broken. We need better a permissions layer that doesn’t get in the way but stops destructive commands. Devs get pushed into running yolo mode cause classifying allow / deny by command is not enough. A sandbox would not have prevented this either.

“nah” is a context aware permission layer that clasifies commands based on what they actually do

nah exposes a type taxonomy: filesystem_delete, network_write, db_write, etc

so commands gets classified contextually:

git push ; Sure. git push --force ; nah?

rm -rf __pycache__ ; Ok, cleaning up. rm ~/.bashrc ; nah.

curl harmless url ; sure. curl destroy_db ; nah.

https://github.com/manuelschipper/nah

Better permissions layers is part of the answer here, and a space that has been only narrowly explored.

Some of this stuff is so embarrassing. Why would you even post this online?

This post is so funny.

Firstly, blaming AI at the same time using AI to construct your whole post - Priceless. Loving it.

Secondly - This entire article reeks of "It's not our fault, you guys have failed us at every step" when in reality you let AI run reckless.

I don't want to say deserved it but like, you knew the risks,

Plenty of everyone doing it wrong, but the most WTF of all the WTFs is the backup storage.

Put your backups in S3 *versioned* storage on a different AWS account from your primary, and set some reasonable JSON lifecycle rule:

     "NoncurrentVersionExpiration": {
        "NoncurrentDays": 30,
        "NewerNoncurrentVersions": 3
     }

That way when someone screws up and your AWS account gets owned, or your databases get deleted by an agent, it doesn't have enough access to delete your backups, and by default, even if you have backups that you want to intentionally delete, you have 30 days to change your mind.

It's fundamentally impossible to stop an agent from performing a destructive action through instruction

Llms are just too creative. They will explore the search space of probable paths to get to their answer. There's no way you can patch all paths

We had to build isolation at the infra level (literally clone the DB) to make it safe enough otherwise there was no way we wouldn't randomly see the DB get deleted at some point

The AI part of this is a red herring. This is above all a big devops failure.

Three takeaways:

1. TEST YOUR BACKUPS. If you have not confirmed that you can restore, then you don’t have backup. If the backups are in the same place as your prod DB, you also don’t have backup.

2. Don’t use Railway. They are not serious.

3. Don’t rely on this guy. The entire postmortem takes no accountability and instead includes a “confession” from Cursor agent. He is also not serious.

4. See #1.

Running a single bad command will happen sometimes, whether by human or machine. If that’s all it takes to perma delete your service then what you have is a hackathon project, not a business.

What do you expect?

We give a non-deterministic system API keys that 99.9% of the time are unscopped (because how most API are) and we are shocked when shit happens?

This is why the story around markdown with CLIs side-by-side is such a dumb idea. It just reverses decades of security progress. Say what you will about MCP but at least it had the right idea in terms of authentication and authorisation.

In fact, the SKILLS.md idea has been bothering me quite a bit as of late too. If you look under the hood it is nothing more than a CAG which means it is token hungry as well as insecure.

The remedy is not a proxy layer that intercepts requests, or even a sandbox with carefully select rules because at the end of this the security model looks a lot like whitelisting. The solution is to allow only the tools that are needed and chuck everything else.

"This is the agent on the record, in writing."

There's no record for the agent to be on - it's always just a bunch of characters that look plausible because of the immense amount of compute we've put behind these, and you were unlucky.

LLMs get things wrong is what we're forever being told.

And the explanation/confession - that's just more 'bunch of characters' providing rationalisation, not confession.

>the question of model-level responsibility versus integration-level responsibility is a story I'll write separately

This guy blames everyone and everything but himself.

I keep having this conversation with clients. If you want to allow an LLM to delete, create or update data; you need to do this with a human in the loop, and explicit hitl gating against execution; where the agent can't even call the tool without triggering an update on the UI that has to be confirmed (then the confirmation issues the actual tool call).

Why is it news? Why grown up people in charge of tech businesses assume it's not going to happen? It's a slot machine - sometimes you get a jackpot, sometimes you lose. Make sure losing is cheap by implementing actual technical guardrails by people who know what they are doing - sandboxing, least privilege principle

it is not intelligent, it is not emotional and it certainly didn't give an explanation. After actions were taken it generated a text that complied with your expectation.

It is still a next word predictor that happens to have really good prediction.

Never ever give admin credentials to an agent. You would never leave your car without parking breaks in a slope would you?

> What needs to change

Plenty of blame to go around, but it I find it odd that they did not see anything wrong in not have real backups themself, away from the railway hosting. Well they had, but 3 month old.

That should be something they can do on their own right now.

I learned not to trust any vendor's backup and recovery promisess when my partner's hosted website, with a monthly paid backup service, had a storage crash and the backup (that had been billed every month for years) turned out not to exist.

You're asking/trusting an agent to do powerful things. It does.

In every session there is the risk that the agent becomes a rogue employee. Voluntarily or involuntarly is not a value system you can count on regarding agents.

No "guardrails" will ever stop it.

Minor point, but one of the complaints is a bit odd:

Guys, did you bother checking the poster's profile? https://xcancel.com/lifeof_jer. SEE THE TWEET BELOW. Smells like a ragebait post to me. Also search online for his alleged "PocketOS" company with software for car rental businesses. I couldn't find anything on Google. (Of course, I might be wrong)

"The future of SEO is AIO" https://xcancel.com/lifeof_jer/status/2034409722624061772 March 18

It's not common, but I've personally built APIs where requests for dangerous modifications like this perform a dry run, giving in the response the resources that would be deleted/changed and a random token, which then needs to be provide to actually make the change. The idea was that this would be presented in the UI for the user to confirm, but it should be as useful or more by AI agents. Also, you get the benefit that the token only approves that particular modification operation, so if the resources change in between, you need to reapprove.

I don't think this is a minor point. It seems clear by this point that the author is clueless how even API works and are just trying to shift blame for third-parties instead assuming that they're just vibecoding their whole product without doing proper checks.

Yes sure, there seems to be lots of ways this issue could have been mitigated, but as other comments said, this mostly happened because the author didn't do its proper homework about how the service they rely their whole product works.

The stupidity of people sinks to new lows every day. It's astonishing just how ignorant people are of table stakes, basic technological concepts.

You just gave an AI destructive write access to your production environment? Your production DB got dropped? Good. That's not the AI's fault, that's yours, for not having sensible access control policies and not observing principle of least privilege.

AWS actually has a thingy on some services called “deletion protection” to prevent automation from accidentally wiping resources the user didn’t want it to (you set the bit, and then you need to make a separate api request to flip the bit back before continuing).

I think it’s designed for things like Terraform or CloudFormation where you might not realize the state machine decided your database needed to be replaced until it’s too late.

I agree that this is the author’s fault considerably more than it is Railway’s, however I have learned from experience that no matter how many “are you sure you want to do this” prompts you have, sometimes users delete stuff they didn’t intend to delete and it’s better to not delete immediately but put it in a queue for deletion in a few hours and offer a way to reverse it. Even if it’s 100% user error, the user is very happy they didn’t lose data and the cost of storing it for an extra 5 hours or so is tiny.

agreed — confirmation belongs on the client side. but the harder question is "what is a client-side check when the client IS an llm agent?" a polite "are you sure?" doesn't bind a probabilistic generator that's motivated to finish the task. the version that actually works: declare the agent's allowed actions in a parsed config that's validated BEFORE the action is emitted. destructive verbs require the operator to approve a diff to that config first. still client-side — but the check isstructural, not behavioral. ended up doing this in bluewave (multi-tenant agent runtime) — explicit @scope and @rules blocks in a parsed .ssl spec, validated before each cycle. the agent literally cannot emit an action outside the declared scope. spec is open at github.com/Galmanus/ssl-spec — mit.

> Are there examples of REST-style APIs that implement a two-step confirmation for modifications?

A pattern I've seen and used for merging common entities together has a sort of two-step confirmation: the first request takes in IDs of the entities to merge and returns a list of objects that would be affected by the merge, and a mergeJobId. Then a separate request is required to actually execute that mergeJob.

User is an idiot for using AI Agent. But I am not saying that it is not also badly designed system. Soft delete or something like should be standard for this type of operations. And any operator should know well enough to enable it for production.

IMO the fail here is not having a true soft delete policy with a delete endpoint available

You need to protect customers from themselves. If you offer a true deletion endpoint/service you need to offer them a way to stop them from being absolute idiots when they inevitably cause a sev 0 for themselves.