It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/
That's why I created prosemirror-collab-commit.
I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.
DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:
It’s local-first, which means you have to handle race conditions.
Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.
DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.
I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.
It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)
You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.
One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.
There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.
EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!
There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.
const result = step.apply(this.doc);
if (result.failed) return false;
I suspect this doesn't work.In theory you can write better bindings yourself. In practice, if the official path falls over under normal editing, telling people to just do more integration work sounds a lot like moving the goalposts.
I think it's defensible to say that this point in particular is not indicting CRDTs in general because I do say the authors are trying to fix it, and then I link to the (unpublicized) first PR in that chain of work (which very few people know about!), and I specifically spend a whole paragraph saying I hope that I a forced to write an article in a year about how they figured it all out! If I was trying to be disingenuous, why do any of that?
In general, the client implementation of collab is pretty simple. Nearly all of the subtlety lies in the server. But it, too, is generally not a lot of code, see for example the author's implementation: https://github.com/ProseMirror/website/tree/master/src/colla...
But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.
That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.
This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.
And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog
BUT, since you mention it, I'll say a bit here. It sounds like you have your own experience, and we'd love to hear about that. But OUR experience was: (1) we found (contrary to popular belief) that OT actually does not require a centralized server, (2) we found it to be harder to implement OT exactly right vs CRDTs, and (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT—although in fairness to OT, we think the problems CRDTs have in general are vastly worse to the end-user experience.
If there's interest I'm happy to write a similar article entirely dedicated to OT. But, for (3), as intuition, we found a lot of the problems that both CRDTs and OT have seem to arise from a fundamental impedance mismatch between the in-memory representation of the state of a modern editor, and the representation that is actually synchronized. That is, when you apply an op (CRDT) or a transform (OT), you have to transform the change into a (to use ProseMirror as an example) valid `Transaction` on an `EditorState`. This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
With all of that said, OT is definitely much closer to what modern editors need, in my opinion at least. The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.
That is one hot take!
The feedback about the delay/pause button is also good, thanks!
[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json
Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.
In particular, when they define <_c, they do this in terms of rule1, rule2, and rule3, but these are defined in terms of <_c, so this is just a circular definition, and therefore actually not a definition at all, but just wishful thinking. They then prove that <_c is a total order, but that proof doesn't matter, because <_c does not exist with the given properties in the first place.
I also tried out the behaviour of their example. Slowing the sync time down to 3 seconds, and then typing "Why not" and then waiting for it to sync before adding " do this?" on client A and " joke?" on client B. The result was "Why not do this? joke?" when I'd have hoped that this would have been flagged as a conflict. Similarly, starting with "Why not?" and adding both " do this" and " joke" in the different clients produced "Why not do this joke?" even though to me, that should have been a conflict - both were inserting different content between "t" and "?".
Finally, changing "do" to "say" in client A and THEN changing "do" to "read" in client B before it updated, actually resulted in a conflict in the log window and the resultant merge was "Why not rayead this joke?" Clearly this merge strategy isn't that great here, as it doesn't seem to be renumbering the version numbers based on the losing side (or I've misunderstood what they're actually doing).
But the product seems much more narrow than an actual tool run the whole business in markdown. I was hoping to see Logseq on steroids, and it feels like a tool builder primarily. I love the tool building aspect, but the fundamentals of simply organizing docs (docs, presentations, assets etc, the basics of a business) are either not part of the core offering or not presented well at all.
I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)
The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.
Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139
Appears, to me, to be manufactured. The degree of consolidation in this 'SF/Bay Area tech cult' which I've noticed, although I am unsure if others are aware, that tries to help other members at the expense of quality, growing network wealth through favoritism rather than adherence to quality, is counterpoint to users whose interest is high quality software without capture.
While you may not like me describing this, it is not in your own interest to do this because it catabolizes the base layer that would sustain you. Social media catabolizes actual social networks, as AI catabolizes those who write information online. Behavior like this ruins the public commons over time.
EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.
DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.
I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.
If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.
If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.
[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
[2]: https://codemirror.net/docs/ref/#collab
[3]: https://codemirror.net/examples/collab/
[4]: https://github.com/ProseMirror/prosemirror-collab
[5]: https://github.com/stepwisehq/prosemirror-collab-commit
https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...
> This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.
The nice thing about OT is that you can just feed the positions of marks into the OT algorithm to get the new positional value. Worst case, you just have the server send the server side position when sending the OT event and the client just displays the server side position.
All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.
It's hard to tell, but I think you also might be saying that criticizing the FOSS foundations of our product actually hurts the ecosystem. I actually am very open to that, and it's why we took so much time writing it since part 1 came out. But the Yjs-alternative technology we use is all also F/OSS, and we also do directly support it, with actual money from our actual bank account. All I'm recommending here is that others do the same. Sorry if that was not clear.
The rest of your reply, I'm not sure I grok. I think you might be suggesting that we are sock-puppeting `auggierose` or `skeptrune`, and that we are part of some (as you put it) "cult" of the Bay area! Let me be clear that neither of these things true. I don't know anyone at Mintlify personally, and in any event we are from Seattle not the Bay!
If you are open to it, I'd love the opportunity to hear more. Here or email (alex@moment.dev) or our Discord (bottom right of our website) or Twitter/X... or whatever you prefer.
In part 1 of this series, we found that users generally view the most popular collaborative text editing algorithms (including the most popular library, Yjs) as silently corrupting their documents when the algorithms resolve direct editing conflicts. We argued that, while this is potentially ok for live collaborative editing (since presence cursors help users to avoid direct editing conflicts), this property makes them generally wholly inappropriate for the offline case, as users will have no ability to avoid such conflicts.
This time, in part 2, we’re going to argue that these same popular algorithms—and Yjs in particular—are also currently inappropriate for the live-collab case. Mostly it comes down to two points:
We’ll describe several specific challenges we experienced as we tried to bring Yjs to our production text editor.
We recommend a less-well-known alternative to Yjs because it is uniformly better on every axis except truly-masterless peer-to-peer editing.
I have heard the argument more times than I can count: CRDTs are operationally complex, but you need them (need them!) for optimistic updates, edits during network blips (or extended disconnection), fine-grained provenance of edits, peer-to-peer reconciliation, and so on. I want to convince you that all of these things (except true master-less p2p architecture) are easily doable without CRDTs.
Yes, easily doable: 40 lines of code (291 if you insist on counting the React scaffolding).
Below, this code is running as a live demo. You can use the Pause button to simulate network disconnect. Edit the documents and unpause to see them synchronize, exactly like they would with a CRDT.
Note: offline reconciliation always produces odd results. We talked about this extensively in part 1. All offline-capable reconciliation algorithms (e.g., CRDTs, OT, and this one) choose resolutions at basically-random. The point is not that this algorithm does better, it’s that it does the same thing as CRDTs, but with vastly less complexity.
This algorithm uses the extremely simple and boring prosemirror-collab library. The author has written about how it works, but it is almost trivial, so I will explain it here too:
For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.
A client submits some transactional steps and the lastSeenVersion.
If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.
If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.
Note: “Authority” does not mean “centralized server that runs in AWS.” You can set up your laptop to be the authority as long as you are sharing with someone else. So this protocol is p2p-capable, but it’s not masterless, which is what CRDTs provide.
And that’s it. That’s all it takes. 40 lines of code is the baseline complexity for optimistic updates, editing even when the network is flakey (or gone for arbitrary amounts of time), fine-grained provenance, and so on.
The only thing that’s missing is truly-masterless peer-to-peer editing. If you need that, great! But if you don’t, what’s the cost?
This is the less fun part of the article to write, because there is no real way to talk about this without appearing to bag on Yjs in particular. I know people work hard on it. But… I also think that as an ecosystem it’s going to be impossible to progress if we do not acknowledge where we are right now. And, based what I know, I believe that where we are right now is: a tight spot.
Ok, let’s get this over with. Yes, I know: everyone else is using Yjs, the most popular collaboration library of all time. So the problem must be us. Right?
I thought that too, for awhile. I can also tell you the moment that the ghost of doubt left my body, and I knew in my bones that this was not true. It’s the moment I saw y-prosemirror issue #113, a seemingly-innocuous and still-currently-open bug report which inadvertently reveals that Yjs will completely destroy and re-create the entire document on every single keystroke.
Is this an accident? Sadly, no. In the discussion on the y-prosemirror announcement thread from 6 years ago, Kevin (the author of Yjs) reveals that this is by design.
There is some ensuing back-and-forth. Marijn (the author of ProseMirror) chimes in to explain that this choice breaks, like, a lot of stuff.
Kevin responds to suggest that, whatever the breakage, it is apparently good enough for Tiptap. Other users suggest that this really does break a lot of stuff—no, really, it does. Marijn responds to suggest that the perf justification for the replace-everything strategy might not be well-founded. There is a way to paper over some of these issues, kind of. And so on.
Aside: Yes, seriously, it breaks a lot of stuff. Performance is worse, because every keystroke causes you to re-create basically everything—every
NodeView, every decoration, all the DOM elements for the entire document. It breaks every plugin that depends on position mappings, e.g., comments and collaborative presence indicators. Undo, cursor position, and selection management all become extremely odd. The state of all the little widgets in your document will continously get totally wiped. Plugins that look atapplyget really slow because they have to inspect the entire document rather than just what changed. Node identity becomes unstable (although Kevin says it is not?). And on and on and on.
I still sort of can’t believe what I’m writing. It could make sense to adopt this regimen if there were no other options—but we do have simpler, faster, better options with none of these problems. What I’m seeing here feels like a mistake that indicates a fundamental misunderstanding of what text editors need to behave well at all, in any situation. And it was completely heartbreaking to read.
Now, look. I understand that the maintainers are working on this issue right now. I sincerely hope they succeed and, in a year, I am forced to write another post about how all of this is now wrong. But that’s not where we seem to be, yet. And our experience was that it was hard to fight against the current architecture.
Our goal is for the editor to run at 60 fps. No matter how many collaborators, no matter how big the edit batches, no matter how complex the document: it is always the case that we have a maximum of ~16ms to do all our work and also a complete React render loop.
Like security, performance does not happen accidentally. It takes careful, targeted work, and constant vigilence for regressions. Below is an (incomplete) list of things that help us meet this target. All of them are harder or impossible with Yjs.
Transactions are very fast to apply. In our benchmarks, a modern machine supports x,000 ProseMirror Transaction applications per second, including time it takes to update the DOM. Additionally, ProseMirror keeps position mappings across document versions, so a Transaction that appears over the network can be “rebased” and applied extremely quickly. As I mentioned before, Yjs does not have this ability at all: every collaborative keystroke deletes and recreates the whole document from scratch.
The server batches transactions into chunks of 20 steps or less. Clients can generally apply 20 Step objects in much less than 16ms. This means that the server can accumulate a large changeset (e.g., with many concurrent editors, a network blip, etc.) and it will still never hang the main thread by accident. This is completely trivial in ProseMirror. I don’t have any intuition for how we’d accomplish this with Yjs.
Conflict resolution never happens on the main thread. We do conflict resolution on the server; it could also happen in a separate worker thread. My understanding is that it’s possible but challenging to run Yjs reconciliation routines in a worker thread.
We keep an eye on what in the EditorView causes latency. For example, right now, the most expensive part of our EditorView is calculating the positions of the remote presence carets. Yjs does not have a specific impact on this, aside from the fact that it’s obviously more expensive to reconstruct the entire EditorView from scratch on every keystroke.
Updates to the DOM objects in the document are incremental. This mostly comes from react-prosemirror, of course, but it again does not go without saying, since Yjs replaces the entire document on each collaborative keystroke.
Visualized, our remote Transaction application pipeline looks like this:
As simple as this approach is, we still regularly fail to meet the perf budget.
Unencouragingly—and even setting aside the delete/recreate issue—the Yjs pipeline is also just considerably more complicated. For one, CRDTs cannot represent rich text editing (it is a legitimately open research problem). Instead, Yjs represents ProseMirror documents using their XML facilities. Since this means they can’t directly use ProseMirror Transaction objects, writes have to convert Transaction to a Yjs XML update; clients likewise receive updates and need to somehow turn the Yjs XML update back into a Transaction and apply it to the ProseMirror doc.
All of these things cost something. Even if they were cheap, Yjs still insists on replacing the document each time. It makes me physically anxious to look at this pipeline.
Again, my understanding is that the Yjs maintainers are starting to make updates more fine-grained, and that the new world will look like the following.
This is definitely closer to what we want, but we will have see how much it helps in practice. What I will say right now is, given how hard it is to get the simple thing to run at 60 fps, it is still intimidating to take this regression, especially if we don’t need a truly-masterless p2p topology.
Most people want a small, sane set of rules that govern the structure of a document, e.g., that blockquote nodes cannot be children of code_block nodes. Document schemas are the primary tool for accomplishing this. They determine whether a Transaction has produced a valid EditorState or not.
Document schemas are generally bundled statically as part of the application code. In a centralized setting, the server can reject proposed Transactions that are invalid, and the application can verify that all clients are on the same schema version. This is, e.g., what the Tiptap docs seem to suggest they do.
Yjs is designed for truly-masterless peer-to-peer topologies, and its defaults are quire a bit more dangerous. They have to be—there is no real authority on what the schema is supposed to be!
In general, from Yjs’s perspective, no running instance knows for sure whether a change is straightforwardly invalid, or it just hasn’t received the new schema yet.
Accordingly, in our testing of y-prosemirror v1.3.7, when schema.node() throws an exception because (including because the schema is invalid), the node appeared to be permanently deleted, and that deletion was propagated to all peers.
You can do better, but you have to know to set it up ahead of time, e.g., Tiptap at least detects schema mismatch and halts the editor and forces a reload.
If left alone, this is particularly disastrous during upgrades. If you’re disconnected for an extended period of time, an upgrade occurs, and people use the new feature, when you connect again, you will silently destroy all the new data. Ouch!
This is not to say that you can’t get around this (e.g., Tiptap does). But, it takes extra work to not blow your entire leg off, in a way that is very hard to debug.
Real-world document editors will generally provide a variety of permissions that can be granted to other users. Obviously there is Editor, but it’s very common to also have Viewer, Commenter, and Suggesting, to name a few examples.
All of these features involve allowing some users selective permissions to edit parts of the document (e.g., adding a comment mark to the document). Normally, this is normally pretty simple to do: you look at the Transaction, see how it alters the document (e.g., does it just set a mark or does it also change text), and accept or reject based on the submitting user’s permissions.
It’s quite a bit more awkward in Yjs. Since Yjs maps Transaction to and from XML updates, you have to basically predict what the net effect will be when it is materialized as a Transaction, and accepting or rejecting based on that prediction. It’s not impossible, but it’s a lot harder than it looks. Additionally, as with schemas, Yjs is built for an authority-less topology, so it has no native facilities built-in for permissions at all, at least as far as I can tell.
One of the things I constantly hear about Yjs is that it makes it easier to stay up when the server goes away. At this point, for most realistic apps, I am prepared to argue this is not true.
First off, modern text editors almost uniformly do many things other than just storing text. Stuff like:
Storing things in media servers, e.g., images you paste into a document
Checking permissions
Presence may or may not be a separate service
Durability (e.g., document might be stored in S3, operations in K/V storage, and so on)
Generally speaking, none of these services will be CRDTs, and if any of them go down (especially permissions), you are probably going to want to stop serving traffic.
This means that you are mainly using the CRDT as a networking protocol, rather than an availability strategy. You certainly can do that, but as I’ve said throughout this article, it is vastly less efficient and vastly more complicated than the alternative network protocol candidates.
The “simple” solution to collab editing stores all steps in durable storage. If you fall behind, you can retrieve them a call to the API equivalent of changes(lastSeenVersion). For reasonable requests, this is a fast and efficient way to catch a client up, and the client can forget all the steps once it’s incorporated them into its EditorState.
Yjs, being designed for truly-masterless p2p topologies, generally can’t forget steps easily. In particular, if an item is deleted, it has to keep around a “tombstone”—a marker that records an item was deleted. This is because concurrent operations that reference a given item’s ID will reconcile incorrectly if the client can’t tell that the item was deleted.
The general solution is to garbage collect (GC) tombstones. But you can only really safely do this when all peers have deleted the item—and you can’t possibly know that, since we can’t distinguish between disconnected and slow clients. So you can either keep the tombstones around for longer (chewing up memory), or you can forget them after some arbitrary time, which will lose data. Yjs’s underlying protocol (called YATA) GCs the tombstones at ~30s.
If you’re not in a truly-masterless p2p topology, this is a completely pointless trade-off. changes(lastSeenVersion) completely solves this problem and has absolutely none of these downsides. All you need is a database to store the steps.
I have now implemented collaborative text editors in pretty much all the ways you can. OT, CRDTs, with prosemirror-collab, with prosemirror-collab-commit, with locking. What I will say is that even a minor bug in the “simple” implementation with prosemirror-collab will stretch you to the absolute edge of your sanity.
Like, as you are typing you will produce hundreds or thousands of updates a minute, and sometimes, your document will be missing a couple of characters that appear on the server.
But why? You won’t know. You won’t be logging the entire document every time (probably) so it will be very hard to track down the exact edit that corrupted the document. You’ll add logging, then go about your business, and when you run into it again you’ll spend 4 hours looking at the logs before you realize you’re missing another critical log line. One time this happened to me, and the problem was that I added an await, but made some decisions on data I’d read before the suspense point, which was sometimes outdated by the time the transaction completed.
In the “simple” solution, you have many tools at your disposal that help ferret out these bugs. You can:
Attach idempotency keys to every request.
Dispatch every write request twice to flush out races.
Aggressively test the server for races.
But what if you’re using CRDTs? Well, all these problems are 100x harder, and none of these mitigations are available to you. Definitionally, the state is only guaranteed to converge. So how do you even know if something is transiently diverged, or simply incorrect? Of course, you can’t. Not really.
At the beginning of this article, I told you that my goal is to convince you that, unless you truly need a truly-masterless peer-to-peer topology, you are better off using the “simple” solution. At this point, I think additional writing is probably not going to help my convince you.
I want to leave you with one other thing, though. And I’m especially talking to the library authors here: when you design a library, you have to start with the end-user experience you want to enable, not an algorithm. For us, it was very simple. We wanted users to be able to collaborate, be tolerant of disconnects, and always run at 60 fps. We wanted users to be able to predict what happens to their data.
When we set out to build this stuff, we assumed everyone had these goals. At the other end of this evaluation, though, it’s hard to imagine that we really did all have this in mind. If we did, the technology landscape in this area would look very different.