Postgres transactions are a distributed systems superpower

So my understanding is that they're aligning the workflow progression unit and the database commit unit on a one-to-one basis. In other words, each step in the workflow becomes a database commit unit. That's why the outbox pattern gets simplified. But in exchange, the database itself becomes tightly coupled to the workflow, which will make it architecturally difficult to separate later on. Although, to be fair, I almost never actually need to separate the database anyway.

In most services, I often swap out the message broker or the workflow engine, but the database almost always stays the same.

I'm not sure if I've understood this correctly.

Congratulations, you discovered a mutex.

Is it really a distributed system or just a bunch of services with a central database?

Can you use postgres as a state store for a distributed application?

It seems this article is trending toward that view: If you can maintain transactional consistency along with application workflow state, then would this generalize to maintaining distributed application state in general?

The follow-up would be: Would this be preferable to Valkey/Redis?

i don't understand the last point of UDF. Either you need the state to be updated atomically across different systems or you don't. But writing a row in a system in order to update the second one at any random time in the future isn't really much different from enqueuing a job in queue.

Congratulations, you discovered a mutex.

Is it really a distributed system or just a bunch of services with a central database?

I don't think it's true that distributed and decentralized mean the same thing. A hub and spoke rail system is centralized, but it's still a distributed system, if it has multiple trains running concurrently.* A distributed system has to coordinate somehow, and a single central DB is one way of doing it.

*: edit, maybe a better example here is a rail system with a single central dispatcher is centralized but may still be distributed

In most services, I often swap out the message broker or the workflow engine, but the database almost always stays the same.

I'm not sure if I've understood this correctly.

Can you use postgres as a state store for a distributed application?

The follow-up would be: Would this be preferable to Valkey/Redis?

Yes, the core design is building a workflow system on a database--essentially, replacing the central orchestrator most workflow systems use with a Postgres database. This previous blog post goes into more detail: https://www.dbos.dev/blog/postgres-is-all-you-need-for-durab... (HN discussion: https://news.ycombinator.com/item?id=48313530)

> then would this generalize to maintaining distributed application state in general?

Yes, in the sense of 'too good to be true'

Yes you can - usually I think it's advisable to wrap postgres in a shim application to provide a consistently defined surface you can control but postgres can absolutely serve as the authority node on data correctness.

As to which technical solution would be optimal there are a bunch of factors to consider and I think preferences around features could lead you to a variety of options. Postgres is excellent as long as you're minimizing the amount of data piping directly through it or operating at a reasonable scale.

Your intuition sounds right to me.

This sounds a lot like reinventing a message queue. Someone trying this in the future might learn painful lessons about ordering, commits, partitioning, dead-letter-queues, replayability, don't-call-me-I'll-call-you, and anything else a Kafka-like comes with out of the box.

The key is that the UDF's enqueue is transactional with the database update. Let's say the database update is inserting a new order. This provides the guarantee that if a new order is inserted, a job to process the order is also enqueued. It's impossible for a new order to be inserted without its processing job also being enqueued. Then the durable workflow/queue system is responsible for making sure the processing job, once enqueued, actually executes.

*: edit, maybe a better example here is a rail system with a single central dispatcher is centralized but may still be distributed

In fact - if you're building a very large distributed system the goal is usually to shrink that centralized component to the smallest and most robust surface you can. If the system is well designed it is amazing just how much consistency power you can get from a tiny component of centralization.

There are always tradeoffs of course, but building a truly decentralized system requires some really difficult compromises to correctness. The two general's problem is a great piece of reading on this topic - distribution always requires compromises in general, but to fully remove an authority on truth gets quite tricky.

Exactly! It's a distributed system, with many processes performing work in parallel, with a central database as a coordination point, used as little as possible. A mutex wouldn't get quite the same performance :)

Your intuition sounds right to me.

And if that job never runs? Or if that job runs and then fails to commit that it ran in postgres?

A more modern term is your system is a single architectural quantum’

Neal Ford calls this a distributed monolith because a change to a database schema can break every single service at once, but there are very valid uses of this method.

There are decades of books on the foot guns as we used this even back in the client-server days.

One suggestion I have is to research where the first version of SoA failed, especially as these systems tend to erode into Enterprise Service Busses.

Products like Apache airflow tend to have value not because of the persistence layer, but because they force workflows into DAGs, which is an enforceable structural constraint, while SQL, being declarative, can sometimes force you into trying to enforce governance through observing behavior.

The former is not subject to Rice’s theorem, while the latter is.

If you actively control for these it will greatly increase the lifetime of this system before (or if) you reach the point you have to replace the system.

> The two general's problem is a great piece of reading on this topic

It is!

And the solution is to add an extra general on the left side. Let's call him Outus Boxus. The two generals on the left side can communicate in perfect lockstep. Then if you need the general on the right to find out about something, you can send a few workers to tell him or something...

More seriously though, you can have a DS for two reasons: tech or political.

Tech means scaling or reliability. So clients can be serviced by any of the nodes.

Political means different actors don't have a central authority. You can't stick two banks into one db.

This technique doesn't seem to address either aspect.

I think Ducklake[1] is a terrific example of this. They said "look, let's build a lake house over S3, but for the bit that needs strong consistency (the manifest of which S3 blobs are in play), let's use Postgres". Postgres as a metadata catalog or control plane is brilliant for this, since you get strong consistency and the scaling story around a metadata catalog is far different than the volume of data you need to store. Use S3 for volume, Postgres for consistent metadata.

A similar pattern has spilled out of projects like Warpstream[2], which I suspect is using Postgres behind the scenes of their control plane.

[1]: https://ducklake.select

[2]: https://www.warpstream.com/

And if that job never runs? Or if that job runs and then fails to commit that it ran in postgres?

> then would this generalize to maintaining distributed application state in general?

Yes, in the sense of 'too good to be true'

A more modern term is your system is a single architectural quantum’

Neal Ford calls this a distributed monolith because a change to a database schema can break every single service at once, but there are very valid uses of this method.

There are decades of books on the foot guns as we used this even back in the client-server days.

One suggestion I have is to research where the first version of SoA failed, especially as these systems tend to erode into Enterprise Service Busses.

The former is not subject to Rice’s theorem, while the latter is.

If you actively control for these it will greatly increase the lifetime of this system before (or if) you reach the point you have to replace the system.

> The two general's problem is a great piece of reading on this topic

It is!

More seriously though, you can have a DS for two reasons: tech or political.

Tech means scaling or reliability. So clients can be serviced by any of the nodes.

Political means different actors don't have a central authority. You can't stick two banks into one db.

This technique doesn't seem to address either aspect.

A similar pattern has spilled out of projects like Warpstream[2], which I suspect is using Postgres behind the scenes of their control plane.

[1]: https://ducklake.select

[2]: https://www.warpstream.com/

I have built and maintain a system that uses a very similar system - we register artifacts with UUIDs into S3 in a specifically write-once, never edit, never remove approach and then store those UUIDs in a postgres system. We simply juggle around the connection of other model objects to UUIDs as needed allowing us to achieve safe guarantees without burdening the centralized system with the massive volume (these artifacts are often 50MB+ PDFs). I will mention that I am quite fond of this approach but it's good to be aware that introducing levels of abstraction like this do necessarily widen some fail points on the storage side - if your service uses multiple persistence stores each additional store exposes yet another point where inconsistency could be introduced and/or a message could be lost. Still, fragmenting your data over multiple stores that are particularly well suited for their specialized usages can be huge for performance and cost.

A few weeks ago, we wrote that you should “just use Postgres” for durable workflows.

That post generated a lot of discussion, but also a misunderstanding. We didn't just mean you should use a workflow engine that stores state in Postgres. We meant your workflow system can, and often should, live inside the same Postgres database as your application.

At first glance, this doesn’t sound like a good idea. Shouldn’t those concerns be separated? Shouldn’t workflow state live in one database and application data in another?

Maybe not.

In distributed systems, co-location is a superpower. When workflow metadata and application data live in the same Postgres database, they can be updated in the same database transaction. That means partial failures are no longer possible, making it far easier to build workflows that correctly handle all edge cases.

In this post, we'll explain why that's possible, and how transactions can simplify tough problems like idempotency and atomicity.

Idempotency with Transactional Steps

One fundamental challenge in distributed systems is idempotency, especially for operations that modify database state.

Durable workflows achieve fault tolerance by checkpointing the result of each step after it completes. If a workflow is interrupted, it resumes from its last checkpointed step instead of starting from the beginning. However, a workflow may be interrupted after completing a step but before recording its checkpoint. When it recovers, it has no record that the step already ran and will execute it again.

As a result, durable workflows alone do not solve the idempotency problem. Workflow engines typically require steps to be idempotent so they can safely be retried without duplicate side effects. For example, consider a step that credits (add money to) a bank account. This is not an idempotent operation: if a step adds $100 to an account, fails, reruns, and adds $100 again, then a total of $200 is added to the account, which is not correct.

The most common solution is to add application-level bookkeeping to guard against this. For example, you can add an additional applied_payments table to keep track of which payments have been applied, update it transactionally, and check against it to make sure you never credit an account twice:

application-level bookkeeping code example

When workflow state and application data are co-located in the same Postgres database, we can eliminate much of this complexity. Instead of checkpointing a step after its database transaction commits, a co-located workflow engine can write the step checkpoint and perform the database update in the same transaction.

To do this, the workflow executes the step using a database transaction provided by the workflow engine. The step performs its database updates, the workflow engine records the checkpoint, and the whole transaction commits atomically:

workflow idempotency code example

By making the database update and checkpoint write part of the same transaction, the workflow engine can provide exactly-once execution semantics for transactional steps:

If the transaction commits, both the database update and the checkpoint are durably recorded, guaranteeing the step will never run again.
If any failure occurs before commit, the entire transaction is rolled back, including both the database update and the checkpoint. When the workflow recovers, it safely re-executes the step from the beginning.

This eliminates the window in which a database update can succeed without a corresponding checkpoint. As a result, transactional steps no longer need application-level idempotency logic or bookkeeping tables. The database operation either happens exactly once and is checkpointed, or it does not happen at all.

Atomicity with a Transactional Workflow Outbox

Another classic challenge in distributed systems is reliably performing updates in multiple systems, for example, updating a database record and sending a notification to another system. This is trickier than it sounds because the operations need to be atomic: they either both happen or neither do, even if there are failures (such as process crashes or network glitches) while performing them.

For example, whenever a customer submits a new order, we may also want to start a workflow that sends the order to a warehouse for fulfillment. Without atomicity, the database and the downstream system may become inconsistent. The order might be submitted without a warehouse being notified, or a warehouse might be notified about an order that was never committed.

The most common solution to this problem is the transactional outbox. The idea is to maintain a new “outbox” table to the database. When we need to perform an atomic update, we run a single database transaction that both:

Updates the database record
Writes a message to the “outbox” table

A separate background process then polls the outbox table and delivers those messages there to the target system.

Here’s an example of what that might look like:

transactional outbox pattern SQL example

Performing the database record update and writing the message to the “outbox” table in one transaction guarantees atomicity: either both records are updated or neither are. Once a message is written to the outbox, it can be delivered asynchronously, even if failures occur after the transaction commits.

The transactional outbox is widely used, but it introduces additional operational complexity. You need infrastructure to poll the outbox, deliver messages, handle retries, and monitor failures. If the workflow engine is a separate system, it can drift out of sync with the database. In practice, resolving discrepancies requires additional infrastructure such as reconciliation jobs to detect database records that were updated without sending notifications to downstream systems.

By leveraging database-backed workflows and co-locating workflow state with application data, we can simplify this pattern. Instead of manually maintaining an outbox table and a separate polling process, we use a Postgres user-defined function (UDF) to enqueue a workflow in the same database transaction as the application update:

DBOS transactional outbox pattern atomicity example

This works following the same principles as the transactional outbox. The workflow is represented by a database row containing its name, queue, and input. The enqueue_workflow UDF creates this row in the same transaction as the user database update, guaranteeing atomicity: either the update completes and the workflow is enqueued, or neither happens. Then, a worker dequeues and executes the workflow asynchronously, reliably performing the required operations.

Learn More

If you like building scalable, reliable systems, we’d love to hear from you. At DBOS, our goal is to make Postgres-backed durable execution as simple and performant as possible. Check it out:

Quickstart: https://docs.dbos.dev/quickstart
GitHub: https://github.com/dbos-inc
Discord community: https://discord.gg/eMUHrvbu67

Hacker Times