Lies I was told about collaborative editing, Part 2: Why we don't use Yjs

Fantastic article. I was particularly interested because WordPress has been working to add collaborative editing and the implementation is based on yjs. I hope that won't end up being an issue...

It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/

I believe the problem here is down to y-prosemirror using Y.XmlFragment under the hood. I wrote my own bindings for prosemirror using Y.Text which was more suitable for my use case and I’m pretty sure it does not have this issue

I remember reading Part 1 back in the day, and this is also an excellent article.

I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.

DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:

It’s local-first, which means you have to handle race conditions.

Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.

DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.

I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.

Just use OT like normal people, it’s been proven to work. No tombstones, no infinite storage requirements or forced “compaction”, fairly easy to debug, algorithm is moderate to complex but there are reference open source implementations to cross check against. You need a server for OT but you’re always going to have a server anyway, one extra websocket won’t hurt you. We regularly have 30-50k websockets connected at a time. CRDTs are a meme and are not for serious applications.

Couldn't agree more with the gist of the argument, especially in the context of ProseMirror.

That's why I created prosemirror-collab-commit.

And let's not forget that the official paper on Yjs is just plain wrong, the "proofs" it contains are circular. They look nice, but they are wrong.

It should be noted that this is about text editing specifically, and for other use-cases YJS is using other code pathways/algorithms, but you have to be careful how you design your data structure for atomic updates.

I'm curious how these approaches compare with MRDTs implemented in Irmin

https://gowthamk.github.io/docs/mrdt.pdf

The actual point of the post: Y.js is slow and buggy.

It's disingenuous to suggest that "Yjs will completely destroy and re-create the entire document on every single keystroke" and that this is "by design" of Yjs. This is a design limitation of the official y-Prosemirror bindings that are integrating two distinct (and complex) projects. The post is implying that this is a flaw in the core Yjs library and an issue with CRDTs as a whole. This is not the case.

It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)

You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.

One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.

There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.

Hi folks, author here. I thought this was dead! I'm here to answer questions if you have them.

EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!

we're about to implement collaborative editing at Mintlify and were considering yjs so this couldn't have come at a better time

Component library page in the docs gives 404

It appears Moment is producing "high-performance, collaborative, truly-offline-capable, fully-programmable document editor" - https://www.moment.dev/blog

There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.

Replacing CRDT with 40 lines of code. Amazing.

Reminds me a bit of google-mobwrite. I wonder why that fell out of favour.

I just read part 1 as well as part 2, for me it raises an interesting question that wasn't addressed. I correctly guessed the question posed about the result of the conflict, and while it's true that's not the end result I'd probably want, it's also important because it gives me visibility of the other user's change. Both users know exactly what the other did - one deleted everything, the other added a u. If you end up with an empty document, the deleting user doesn't know about the spelling correction that may need to be re-applied elsewhere. Perhaps they just cut and pasted that section elsewhere in the document.

But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.

That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.

This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.

From the "40 line CRDT replacement":

    const result = step.apply(this.doc);
    if (result.failed) return false;

I suspect this doesn't work.

Very likely AI slop, very hard to read. Too many indications. HN should have another rule: explicitly mention if article was written (primarily) by AI.

Fantastic article. I was particularly interested because WordPress has been working to add collaborative editing and the implementation is based on yjs. I hope that won't end up being an issue...

It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/

Couldn't agree more with the gist of the argument, especially in the context of ProseMirror.

That's why I created prosemirror-collab-commit.

I'm curious how these approaches compare with MRDTs implemented in Irmin

https://gowthamk.github.io/docs/mrdt.pdf

The actual point of the post: Y.js is slow and buggy.

Component library page in the docs gives 404

Replacing CRDT with 40 lines of code. Amazing.

Reminds me a bit of google-mobwrite. I wonder why that fell out of favour.

I remember reading Part 1 back in the day, and this is also an excellent article.

I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.

It’s local-first, which means you have to handle race conditions.

Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.

DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.

I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.

Hello again Germán! Since the product we make is, basically, a local-first markdown file editor, I would humbly suggest that the less-well-known algorithm we recommend is thus also local-first. But, I fully believe that you do a ton of stuff that we don't, and if we had known about it at the time, we very definitely would have taken a close look! We did not set out to do this ourselves, it just kind of ended up that way.

Cool! We also build client-server sync for our local-first CMS: https://github.com/valbuild/val Just as your docsync, it has to both guarantee order and sync to multiple types of servers (your own computer for local dev, cloud service in prod). Base format is rfc 6902 json patches. Read the spec sheet and it is very similar :)

Tiny fail at undo: insert 1 before E, Ctlr+Z, move left/right: left editor moves around E, right editor moves around the nonexistent 1

And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog

Author here, I did not specifically mention OT in the article, since our main focus was to help people understand the downsides of the currently-most-popular system, which is built on CRDTs.

BUT, since you mention it, I'll say a bit here. It sounds like you have your own experience, and we'd love to hear about that. But OUR experience was: (1) we found (contrary to popular belief) that OT actually does not require a centralized server, (2) we found it to be harder to implement OT exactly right vs CRDTs, and (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT—although in fairness to OT, we think the problems CRDTs have in general are vastly worse to the end-user experience.

If there's interest I'm happy to write a similar article entirely dedicated to OT. But, for (3), as intuition, we found a lot of the problems that both CRDTs and OT have seem to arise from a fundamental impedance mismatch between the in-memory representation of the state of a modern editor, and the representation that is actually synchronized. That is, when you apply an op (CRDT) or a transform (OT), you have to transform the change into a (to use ProseMirror as an example) valid `Transaction` on an `EditorState`. This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.

With all of that said, OT is definitely much closer to what modern editors need, in my opinion at least. The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.

Are there any major libraries for OT? I've been looking into this recently for a project at work, and OT would be completely sufficient for our use case, and does look simpler overall, but from what I could tell, we'd need to write a lot of stuff ourselves. The only vaguely active-looking project in JS at least seems to be DocNode (https://www.docukit.dev/docnode), and that looks very cool but also very early days.

"CRDTs are a meme and are not for serious applications."

That is one hot take!

And let's not forget that the official paper on Yjs is just plain wrong, the "proofs" it contains are circular. They look nice, but they are wrong.

Could you elaborate on that or share a source? It sounds like it'd be not just interesting but important to learn.

There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.

You can split CRDT libs and compose them however you want, but most teams never get past the blessed bindings, because stitching two moving targets together by hand is miserable even if you know both codebases. Then you're chasing a perf cliff and weird state glitches every time one side revs.

In theory you can write better bindings yourself. In practice, if the official path falls over under normal editing, telling people to just do more integration work sounds a lot like moving the goalposts.

Author here, sorry if this was not clear: that specific point was not supposed to be an indictment of all CRDTs, it was supposed to be much more narrow. Specifically, the Yjs authors clearly state that they purposefully designed its interface to ProseMirror to delete and recreate the entire document on every collab keystroke, and the fact that it stayed open for 6 YEARS before they started to try to fix it, does in my opinion indicate a fundamental misunderstanding of what modern text editors need to behave well in any situation. Not even a collaborative one. Just any situation at all.

I think it's defensible to say that this point in particular is not indicting CRDTs in general because I do say the authors are trying to fix it, and then I link to the (unpublicized) first PR in that chain of work (which very few people know about!), and I specifically spend a whole paragraph saying I hope that I a forced to write an article in a year about how they figured it all out! If I was trying to be disingenuous, why do any of that?

Hi folks, author here. I thought this was dead! I'm here to answer questions if you have them.

EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!

Just wanted to say thanks! This is a great write up and resonates with issues I encountered when trying to productionise a yjs backed feature.

we're about to implement collaborative editing at Mintlify and were considering yjs so this couldn't have come at a better time

Author here, my personal mission is for people implementing this to have clear, actionable advice. Which is something we did not when we started. If you want to chat about it I'm happy to help, just email me: clemmer.alexander@gmail.com

It appears Moment is producing "high-performance, collaborative, truly-offline-capable, fully-programmable document editor" - https://www.moment.dev/blog

There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.

Author here. To be clear, we do not in ANY WAY compete with Yjs! We are a potential customer of Yjs. This article explains why we chose not to be a customer of Yjs, and why we don't think most people building real-time collaborative text editors should be, either.

From the "40 line CRDT replacement":

    const result = step.apply(this.doc);
    if (result.failed) return false;

I suspect this doesn't work.

Author here. I'll actually defend this. Most of the subtlety of this part is actually in document schema version mismatches, and you'd handle that at client connect, generally, since we want the server to dictate the schema version you're using.

In general, the client implementation of collab is pretty simple. Nearly all of the subtlety lies in the server. But it, too, is generally not a lot of code, see for example the author's implementation: https://github.com/ProseMirror/website/tree/master/src/colla...

Just wanted to say thanks! This is a great write up and resonates with issues I encountered when trying to productionise a yjs backed feature.

You have an amazing tagline. This is the first time I read a tagline and thought: this is exactly what I was looking for.

But the product seems much more narrow than an actual tool run the whole business in markdown. I was hoping to see Logseq on steroids, and it feels like a tool builder primarily. I love the tool building aspect, but the fundamentals of simply organizing docs (docs, presentations, assets etc, the basics of a business) are either not part of the core offering or not presented well at all.

I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)

That doesn't make sense. If you are a customer that implies you pay for it, so people can be users of Yjs which is free and open-source, but not customers.

The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.

Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139

Appears, to me, to be manufactured. The degree of consolidation in this 'SF/Bay Area tech cult' which I've noticed, although I am unsure if others are aware, that tries to help other members at the expense of quality, growing network wealth through favoritism rather than adherence to quality, is counterpoint to users whose interest is high quality software without capture.

While you may not like me describing this, it is not in your own interest to do this because it catabolizes the base layer that would sustain you. Social media catabolizes actual social networks, as AI catabolizes those who write information online. Behavior like this ruins the public commons over time.

I see someone has downvoted my actually relevant post. Not sure why, but anyway.

I also tried out the behaviour of their example. Slowing the sync time down to 3 seconds, and then typing "Why not" and then waiting for it to sync before adding " do this?" on client A and " joke?" on client B. The result was "Why not do this? joke?" when I'd have hoped that this would have been flagged as a conflict. Similarly, starting with "Why not?" and adding both " do this" and " joke" in the different clients produced "Why not do this joke?" even though to me, that should have been a conflict - both were inserting different content between "t" and "?".

Finally, changing "do" to "say" in client A and THEN changing "do" to "read" in client B before it updated, actually resulted in a conflict in the log window and the resultant merge was "Why not rayead this joke?" Clearly this merge strategy isn't that great here, as it doesn't seem to be renumbering the version numbers based on the losing side (or I've misunderstood what they're actually doing).

Very likely AI slop, very hard to read. Too many indications. HN should have another rule: explicitly mention if article was written (primarily) by AI.

I'm the author. Literally 0% of this was written with AI. Not an outline, not the arguments, not a single word in any paragraph. We agonized over every aspect of this article: the wording, the structure, and in particular, about whether we were being fair to Yjs. We moved the second and third section around constantly. About a dozen people reviewed it and gave feedback.

EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.

It doesn’t strike me as AI. The writing is reasonably information-dense and specific, logically coherent, a bit emotional. Rarely overconfident or vague. If it is AI then there was a lot more human effort put into refining it than most AI writing I’ve read.

Funnily enough I had 2 HN tabs open, this one and https://news.ycombinator.com/item?id=47394004

Tiny fail at undo: insert 1 before E, Ctlr+Z, move left/right: left editor moves around E, right editor moves around the nonexistent 1

And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog

Yes, the undo issue is a known bug in the website demo because it's messing with Lexical's undo functionality. It's not actually a DocNode bug. I'll fix it soon.

The feedback about the delay/pause button is also good, thanks!

Looks really cool, I would love to use it in my DollarDeploy project. Documentation could be a bit better still, it is not clear, are content is pure markdown or it is typescript files? Which GitHub repo it synchronizes to? I prefer monorepo approach.

Author here. I think it depends what you're doing! OT is a true distributed systems algorithm and to my knowledge there are no projects that implement true, distributed OT with strong support for modern rich text editor SDKs like ProseMirror. ShareJS, for example, is abandoned, and predates most modern editors.

If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.

If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.

[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...

[2]: https://codemirror.net/docs/ref/#collab

[3]: https://codemirror.net/examples/collab/

[4]: https://github.com/ProseMirror/prosemirror-collab

[5]: https://github.com/stepwisehq/prosemirror-collab-commit

Author of DocNode here. Yes, it’s still early days. But it’s a very robust library that I don’t expect will go through many breaking changes. It has been developed privately for over 2 years and has 100% test coverage. Additionally, each test uses a wrapper to validate things like operation reversibility, consistency across different replicas, etc.

DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.

I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.

Author here, I did not specifically mention OT in the article, since our main focus was to help people understand the downsides of the currently-most-popular system, which is built on CRDTs.

One way to minimize impedance mismatch is to work with DOM-like or JSON-like structures mostly immune to transient bugs, which I am doing currently in the librdx project. It has full-CRDT RDX format[1] and essentially-JSON BASON[2] format. It does not solve all the problems, more like the set of problems is different. On the good side, it is really difficult to break. On the bad side, it lacks some of the rigor (esp BASON) that mature CRDT models have. But, those models are way more complex and, most likely, will have mismatching bugs in different implementations. No free lunch.

[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json

Having a central server is not necessary, but we have one anyway and we use it, especially if you have a permissions system. It lets us use the "Google wave" algorithm which vastly simplifies things.

https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...

> This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.

Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.

The nice thing about OT is that you can just feed the positions of marks into the OT algorithm to get the new positional value. Worst case, you just have the server send the server side position when sending the OT event and the client just displays the server side position.

I always mentally slotted prosemirror-collab/your recommended solution in the OT category. What’s the difference between the “rebase” step and the “transformation” step you’re saying it doesn’t need?

"CRDTs are a meme and are not for serious applications."

That is one hot take!

Let's balance the discussion a bit.

https://josephg.com/blog/crdts-are-the-future/

Could you elaborate on that or share a source? It sounds like it'd be not just interesting but important to learn.

https://dl.acm.org/doi/epdf/10.1145/2957276.2957310

Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.

In particular, when they define <_c, they do this in terms of rule1, rule2, and rule3, but these are defined in terms of <_c, so this is just a circular definition, and therefore actually not a definition at all, but just wishful thinking. They then prove that <_c is a total order, but that proof doesn't matter, because <_c does not exist with the given properties in the first place.

Funnily enough I had 2 HN tabs open, this one and https://news.ycombinator.com/item?id=47394004

Yes, the undo issue is a known bug in the website demo because it's messing with Lexical's undo functionality. It's not actually a DocNode bug. I'll fix it soon.

The feedback about the delay/pause button is also good, thanks!

[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json

Let's balance the discussion a bit.

https://josephg.com/blog/crdts-are-the-future/

https://dl.acm.org/doi/epdf/10.1145/2957276.2957310

Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.

I see someone has downvoted my actually relevant post. Not sure why, but anyway.

You have an amazing tagline. This is the first time I read a tagline and thought: this is exactly what I was looking for.

I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)

That doesn't make sense. If you are a customer that implies you pay for it, so people can be users of Yjs which is free and open-source, but not customers.

The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.

Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139

EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.

DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.

I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.

I've looked through the site, and right now it's probably the thing I'd try out first, but my main concerns are the missing documentation, particular the more cookbook-y kinds of documentation — how you might achieve such-and-such effect, etc. For example, the sync example is very terse, although I can understand why you'd like to encourage people to use the more robust, paid-for solution! Also just general advice on how to use DocNode effectively from your experience would be useful, things like schema design or notes about how each operation works and when to prefer one kind of operation or structure over another.

All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.

[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...

[2]: https://codemirror.net/docs/ref/#collab

[3]: https://codemirror.net/examples/collab/

[4]: https://github.com/ProseMirror/prosemirror-collab

[5]: https://github.com/stepwisehq/prosemirror-collab-commit

Cheers for plugging prosemirror-collab-commit! Nice to see it's getting used more.

In our case, we're not using a text editor, but instead building a spreadsheet, so a lot of these collab-built-into-an-editor are, like you say, pedagogically useful but less helpful as direct building blocks that we can just pull in and use. But the advice is very useful, thank you!

Having a central server is not necessary, but we have one anyway and we use it, especially if you have a permissions system. It lets us use the "Google wave" algorithm which vastly simplifies things.

https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...

Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.

Josh eloquently explains how Google Wave's DACP (Distributed Application Canceling Protocol) works:

https://www.youtube.com/watch?v=4Z4RKRLaSug

Cheers for plugging prosemirror-collab-commit! Nice to see it's getting used more.

All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.

Josh eloquently explains how Google Wave's DACP (Distributed Application Canceling Protocol) works:

https://www.youtube.com/watch?v=4Z4RKRLaSug

This is great feedback, thank you. I will say that IS our goal... but we only really launched last week and are still figuring out what resonates with people and what they really want! It sounds like you're saying that the organization aspects are not there, which is very helpful to know... I am not quite sure I understand if you also think the toolbuilding is lacking?

If you are open to it, I'd love the opportunity to hear more. Here or email (alex@moment.dev) or our Discord (bottom right of our website) or Twitter/X... or whatever you prefer.

I'm not sure I fully understand, but to be clear, we actually do voluntarily pay for the Free and OSS software we use. For example, we support `react-prosemirror` directly with monetary compensation. And if we used Yjs, we would have paid for that too. So in that sense, I do think of us as customers!

It's hard to tell, but I think you also might be saying that criticizing the FOSS foundations of our product actually hurts the ecosystem. I actually am very open to that, and it's why we took so much time writing it since part 1 came out. But the Yjs-alternative technology we use is all also F/OSS, and we also do directly support it, with actual money from our actual bank account. All I'm recommending here is that others do the same. Sorry if that was not clear.

The rest of your reply, I'm not sure I grok. I think you might be suggesting that we are sock-puppeting `auggierose` or `skeptrune`, and that we are part of some (as you put it) "cult" of the Bay area! Let me be clear that neither of these things true. I don't know anyone at Mintlify personally, and in any event we are from Seattle not the Bay!

If you are open to it, I'd love the opportunity to hear more. Here or email (alex@moment.dev) or our Discord (bottom right of our website) or Twitter/X... or whatever you prefer.

In part 1 of this series, we found that users generally view the most popular collaborative text editing algorithms (including the most popular library, Yjs) as silently corrupting their documents when the algorithms resolve direct editing conflicts. We argued that, while this is potentially ok for live collaborative editing (since presence cursors help users to avoid direct editing conflicts), this property makes them generally wholly inappropriate for the offline case, as users will have no ability to avoid such conflicts.

This time, in part 2, we’re going to argue that these same popular algorithms—and Yjs in particular—are also currently inappropriate for the live-collab case. Mostly it comes down to two points:

We’ll describe several specific challenges we experienced as we tried to bring Yjs to our production text editor.

We recommend a less-well-known alternative to Yjs because it is uniformly better on every axis except truly-masterless peer-to-peer editing.

Demo time: the simple solution (~40 lines of code)

I have heard the argument more times than I can count: CRDTs are operationally complex, but you need them (need them!) for optimistic updates, edits during network blips (or extended disconnection), fine-grained provenance of edits, peer-to-peer reconciliation, and so on. I want to convince you that all of these things (except true master-less p2p architecture) are easily doable without CRDTs.

Yes, easily doable: 40 lines of code (291 if you insist on counting the React scaffolding).

Below, this code is running as a live demo. You can use the Pause button to simulate network disconnect. Edit the documents and unpause to see them synchronize, exactly like they would with a CRDT.

Note: offline reconciliation always produces odd results. We talked about this extensively in part 1. All offline-capable reconciliation algorithms (e.g., CRDTs, OT, and this one) choose resolutions at basically-random. The point is not that this algorithm does better, it’s that it does the same thing as CRDTs, but with vastly less complexity.

How the simple thing works

This algorithm uses the extremely simple and boring prosemirror-collab library. The author has written about how it works, but it is almost trivial, so I will explain it here too:

For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.

A client submits some transactional steps and the lastSeenVersion.

If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.

If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.

Note: “Authority” does not mean “centralized server that runs in AWS.” You can set up your laptop to be the authority as long as you are sharing with someone else. So this protocol is p2p-capable, but it’s not masterless, which is what CRDTs provide.

And that’s it. That’s all it takes. 40 lines of code is the baseline complexity for optimistic updates, editing even when the network is flakey (or gone for arbitrary amounts of time), fine-grained provenance, and so on.

The only thing that’s missing is truly-masterless peer-to-peer editing. If you need that, great! But if you don’t, what’s the cost?

Challenges implementing Yjs and CRDTs

This is the less fun part of the article to write, because there is no real way to talk about this without appearing to bag on Yjs in particular. I know people work hard on it. But… I also think that as an ecosystem it’s going to be impossible to progress if we do not acknowledge where we are right now. And, based what I know, I believe that where we are right now is: a tight spot.

“But everyone is using Yjs”

Ok, let’s get this over with. Yes, I know: everyone else is using Yjs, the most popular collaboration library of all time. So the problem must be us. Right?

I thought that too, for awhile. I can also tell you the moment that the ghost of doubt left my body, and I knew in my bones that this was not true. It’s the moment I saw y-prosemirror issue #113, a seemingly-innocuous and still-currently-open bug report which inadvertently reveals that Yjs will completely destroy and re-create the entire document on every single keystroke.

Is this an accident? Sadly, no. In the discussion on the y-prosemirror announcement thread from 6 years ago, Kevin (the author of Yjs) reveals that this is by design.

There is some ensuing back-and-forth. Marijn (the author of ProseMirror) chimes in to explain that this choice breaks, like, a lot of stuff.

Kevin responds to suggest that, whatever the breakage, it is apparently good enough for Tiptap. Other users suggest that this really does break a lot of stuff—no, really, it does. Marijn responds to suggest that the perf justification for the replace-everything strategy might not be well-founded. There is a way to paper over some of these issues, kind of. And so on.

Aside: Yes, seriously, it breaks a lot of stuff. Performance is worse, because every keystroke causes you to re-create basically everything—every NodeView, every decoration, all the DOM elements for the entire document. It breaks every plugin that depends on position mappings, e.g., comments and collaborative presence indicators. Undo, cursor position, and selection management all become extremely odd. The state of all the little widgets in your document will continously get totally wiped. Plugins that look at apply get really slow because they have to inspect the entire document rather than just what changed. Node identity becomes unstable (although Kevin says it is not?). And on and on and on.

I still sort of can’t believe what I’m writing. It could make sense to adopt this regimen if there were no other options—but we do have simpler, faster, better options with none of these problems. What I’m seeing here feels like a mistake that indicates a fundamental misunderstanding of what text editors need to behave well at all, in any situation. And it was completely heartbreaking to read.

Now, look. I understand that the maintainers are working on this issue right now. I sincerely hope they succeed and, in a year, I am forced to write another post about how all of this is now wrong. But that’s not where we seem to be, yet. And our experience was that it was hard to fight against the current architecture.

Yjs make it much, much harder to hit latency perf goals

Our goal is for the editor to run at 60 fps. No matter how many collaborators, no matter how big the edit batches, no matter how complex the document: it is always the case that we have a maximum of ~16ms to do all our work and also a complete React render loop.

Like security, performance does not happen accidentally. It takes careful, targeted work, and constant vigilence for regressions. Below is an (incomplete) list of things that help us meet this target. All of them are harder or impossible with Yjs.

Transactions are very fast to apply. In our benchmarks, a modern machine supports x,000 ProseMirror Transaction applications per second, including time it takes to update the DOM. Additionally, ProseMirror keeps position mappings across document versions, so a Transaction that appears over the network can be “rebased” and applied extremely quickly. As I mentioned before, Yjs does not have this ability at all: every collaborative keystroke deletes and recreates the whole document from scratch.

The server batches transactions into chunks of 20 steps or less. Clients can generally apply 20 Step objects in much less than 16ms. This means that the server can accumulate a large changeset (e.g., with many concurrent editors, a network blip, etc.) and it will still never hang the main thread by accident. This is completely trivial in ProseMirror. I don’t have any intuition for how we’d accomplish this with Yjs.

Conflict resolution never happens on the main thread. We do conflict resolution on the server; it could also happen in a separate worker thread. My understanding is that it’s possible but challenging to run Yjs reconciliation routines in a worker thread.

We keep an eye on what in the EditorView causes latency. For example, right now, the most expensive part of our EditorView is calculating the positions of the remote presence carets. Yjs does not have a specific impact on this, aside from the fact that it’s obviously more expensive to reconstruct the entire EditorView from scratch on every keystroke.

Updates to the DOM objects in the document are incremental. This mostly comes from react-prosemirror, of course, but it again does not go without saying, since Yjs replaces the entire document on each collaborative keystroke.

Visualized, our remote Transaction application pipeline looks like this:

As simple as this approach is, we still regularly fail to meet the perf budget.

Unencouragingly—and even setting aside the delete/recreate issue—the Yjs pipeline is also just considerably more complicated. For one, CRDTs cannot represent rich text editing (it is a legitimately open research problem). Instead, Yjs represents ProseMirror documents using their XML facilities. Since this means they can’t directly use ProseMirror Transaction objects, writes have to convert Transaction to a Yjs XML update; clients likewise receive updates and need to somehow turn the Yjs XML update back into a Transaction and apply it to the ProseMirror doc.

All of these things cost something. Even if they were cheap, Yjs still insists on replacing the document each time. It makes me physically anxious to look at this pipeline.

Again, my understanding is that the Yjs maintainers are starting to make updates more fine-grained, and that the new world will look like the following.

This is definitely closer to what we want, but we will have see how much it helps in practice. What I will say right now is, given how hard it is to get the simple thing to run at 60 fps, it is still intimidating to take this regression, especially if we don’t need a truly-masterless p2p topology.

Yjs is at odds with document schemas

Most people want a small, sane set of rules that govern the structure of a document, e.g., that blockquote nodes cannot be children of code_block nodes. Document schemas are the primary tool for accomplishing this. They determine whether a Transaction has produced a valid EditorState or not.

Document schemas are generally bundled statically as part of the application code. In a centralized setting, the server can reject proposed Transactions that are invalid, and the application can verify that all clients are on the same schema version. This is, e.g., what the Tiptap docs seem to suggest they do.

Yjs is designed for truly-masterless peer-to-peer topologies, and its defaults are quire a bit more dangerous. They have to be—there is no real authority on what the schema is supposed to be!

In general, from Yjs’s perspective, no running instance knows for sure whether a change is straightforwardly invalid, or it just hasn’t received the new schema yet.

Accordingly, in our testing of y-prosemirror v1.3.7, when schema.node() throws an exception because (including because the schema is invalid), the node appeared to be permanently deleted, and that deletion was propagated to all peers.

You can do better, but you have to know to set it up ahead of time, e.g., Tiptap at least detects schema mismatch and halts the editor and forces a reload.

If left alone, this is particularly disastrous during upgrades. If you’re disconnected for an extended period of time, an upgrade occurs, and people use the new feature, when you connect again, you will silently destroy all the new data. Ouch!

This is not to say that you can’t get around this (e.g., Tiptap does). But, it takes extra work to not blow your entire leg off, in a way that is very hard to debug.

Yjs makes permissions vastly more painful

Real-world document editors will generally provide a variety of permissions that can be granted to other users. Obviously there is Editor, but it’s very common to also have Viewer, Commenter, and Suggesting, to name a few examples.

All of these features involve allowing some users selective permissions to edit parts of the document (e.g., adding a comment mark to the document). Normally, this is normally pretty simple to do: you look at the Transaction, see how it alters the document (e.g., does it just set a mark or does it also change text), and accept or reject based on the submitting user’s permissions.

It’s quite a bit more awkward in Yjs. Since Yjs maps Transaction to and from XML updates, you have to basically predict what the net effect will be when it is materialized as a Transaction, and accepting or rejecting based on that prediction. It’s not impossible, but it’s a lot harder than it looks. Additionally, as with schemas, Yjs is built for an authority-less topology, so it has no native facilities built-in for permissions at all, at least as far as I can tell.

You’re probably not just using a CRDT, so availability is not actually better

One of the things I constantly hear about Yjs is that it makes it easier to stay up when the server goes away. At this point, for most realistic apps, I am prepared to argue this is not true.

First off, modern text editors almost uniformly do many things other than just storing text. Stuff like:

Storing things in media servers, e.g., images you paste into a document

Checking permissions

Presence may or may not be a separate service

Durability (e.g., document might be stored in S3, operations in K/V storage, and so on)

Generally speaking, none of these services will be CRDTs, and if any of them go down (especially permissions), you are probably going to want to stop serving traffic.

This means that you are mainly using the CRDT as a networking protocol, rather than an availability strategy. You certainly can do that, but as I’ve said throughout this article, it is vastly less efficient and vastly more complicated than the alternative network protocol candidates.

Tombstoning loses data or chews up all your RAM

The “simple” solution to collab editing stores all steps in durable storage. If you fall behind, you can retrieve them a call to the API equivalent of changes(lastSeenVersion). For reasonable requests, this is a fast and efficient way to catch a client up, and the client can forget all the steps once it’s incorporated them into its EditorState.

Yjs, being designed for truly-masterless p2p topologies, generally can’t forget steps easily. In particular, if an item is deleted, it has to keep around a “tombstone”—a marker that records an item was deleted. This is because concurrent operations that reference a given item’s ID will reconcile incorrectly if the client can’t tell that the item was deleted.

The general solution is to garbage collect (GC) tombstones. But you can only really safely do this when all peers have deleted the item—and you can’t possibly know that, since we can’t distinguish between disconnected and slow clients. So you can either keep the tombstones around for longer (chewing up memory), or you can forget them after some arbitrary time, which will lose data. Yjs’s underlying protocol (called YATA) GCs the tombstones at ~30s.

If you’re not in a truly-masterless p2p topology, this is a completely pointless trade-off. changes(lastSeenVersion) completely solves this problem and has absolutely none of these downsides. All you need is a database to store the steps.

CRDTs are much, much harder to debug

I have now implemented collaborative text editors in pretty much all the ways you can. OT, CRDTs, with prosemirror-collab, with prosemirror-collab-commit, with locking. What I will say is that even a minor bug in the “simple” implementation with prosemirror-collab will stretch you to the absolute edge of your sanity.

Like, as you are typing you will produce hundreds or thousands of updates a minute, and sometimes, your document will be missing a couple of characters that appear on the server.

But why? You won’t know. You won’t be logging the entire document every time (probably) so it will be very hard to track down the exact edit that corrupted the document. You’ll add logging, then go about your business, and when you run into it again you’ll spend 4 hours looking at the logs before you realize you’re missing another critical log line. One time this happened to me, and the problem was that I added an await, but made some decisions on data I’d read before the suspense point, which was sometimes outdated by the time the transaction completed.

In the “simple” solution, you have many tools at your disposal that help ferret out these bugs. You can:

Attach idempotency keys to every request.

Dispatch every write request twice to flush out races.

Aggressively test the server for races.

But what if you’re using CRDTs? Well, all these problems are 100x harder, and none of these mitigations are available to you. Definitionally, the state is only guaranteed to converge. So how do you even know if something is transiently diverged, or simply incorrect? Of course, you can’t. Not really.

… and I could go on…

At the beginning of this article, I told you that my goal is to convince you that, unless you truly need a truly-masterless peer-to-peer topology, you are better off using the “simple” solution. At this point, I think additional writing is probably not going to help my convince you.

I want to leave you with one other thing, though. And I’m especially talking to the library authors here: when you design a library, you have to start with the end-user experience you want to enable, not an algorithm. For us, it was very simple. We wanted users to be able to collaborate, be tolerant of disconnects, and always run at 60 fps. We wanted users to be able to predict what happens to their data.

When we set out to build this stuff, we assumed everyone had these goals. At the other end of this evaluation, though, it’s hard to imagine that we really did all have this in mind. If we did, the technology landscape in this area would look very different.

Hacker Times