Looking at Unity made me understand the point of C++ coroutines

You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:

1. C++20 coros are stackless, in the general case every async "function call" heap allocates.

2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.

3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all

[1] https://github.com/edubart/minicoro

Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.

[0] https://web.archive.org/web/20260105235513/https://www.chiar...

Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.

Coroutines generally imply some sort of magic to me.

I would just go straight to tbb and concurrent_unordered_map!

The challenge of parallelism does not come from how to make things parallel, but how you share memory:

How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.

My challenge right now is how do I make the JVM fast on native memory:

1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.

We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?

Looking at C++ made me understand the point of Rust.

Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.

As I mentioned on the Reddit thread,

This is quite understandable when you know the history behind how C++ coroutines came to be.

They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.

Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.

As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.

That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.

(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)

More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.

It can easily and often does lead to messy rube goldberg machines.

There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

Coroutines is just a way to write continuations in an imperative style and with more overhead.

I never understood the value. Just use lambdas/callbacks.

In Haskell this technique has been called ‘reinversion of control’: http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-o...

No serious devs even uses Unity coroutines. Terrible control flow and perf. Fine for small projects on PC.

[0] https://web.archive.org/web/20260105235513/https://www.chiar...

1. C++20 coros are stackless, in the general case every async "function call" heap allocates.

2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.

[1] https://github.com/edubart/minicoro

C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.

> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.

That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.

> that must in turn be backed by a multithreaded event loop to be useful

Why? You can just as well execute all your coroutines on a single thread. Many networking applications are doing fine with just use a single ASIO thread.

Another example: you could write game behavior in C++ coroutines and schedule them on the thread that handles the game logic. If you want to wait for N seconds inside the coroutine, just yield it as a number. When the scheduler resumes a coroutine, it receives the delta time and then reschedules the coroutine accordingly. This is also a common technique in music programming languages to implement musical sequencing (e.g. SuperCollider)

> From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful

Multithreaded? Nope. You can do C++ coroutines just fine in a single-threaded context.

Event loop? Only if you're wanting to do IO in your coroutines and not block other coroutines while waiting for that IO to finish.

> most people end up using coroutines with something like boost::asio

Sure. But you don't have to. Asio is available without the kitchen sink: https://think-async.com/Asio/

Coroutines are actually really approachable. You don't need boost::asio, but it certainly makes it a lot easier.

I recommend watching Daniela Engert's 2022 presentation, Contemporary C++ in Action: https://www.youtube.com/watch?v=yUIFdL3D0Vk

Much of the original motivation for async was for single threaded event loops. Node and Python, for example. In C# it was partly motivated by the way Windows handles a "UI thread": if you're using the native Windows controls, you can only do so from one thread. There's quite a bit of machinery in there (ConfigureAwait) to control whether your async routine is run on the UI thread or on a different worker pool thread.

In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.

ASIO is also available outside of boost! https://github.com/chriskohlhoff/asio

Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Thankfully they are actively working towards upgrading, Unity 6.8 (they're currently on 6.4) is supposed to move fully towards CoreCLR, and removing Mono. We'll then finally be able to move to C# 14 (from C# 9, which came out in 2020), as well as use newer .NET functionality.

https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...

>The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?

Unity is currently on C# 9 and that IEnumerable trick is no longer needed in new codebases. async is properly supported.

Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.

Edit: Nevermind, they eventually bothered.

IIRC generators and co-routines are equivalent in a sense that you can implement one with the other.

Not too different from C++'s iterator interface for generators, I guess.

As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.

That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.

(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)

If you need to implement an async state machine, couldn't that just as easily be done with std::future? How do coroutines make this cleaner/better?

More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.

It can easily and often does lead to messy rube goldberg machines.

This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.

This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.

I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.

This timing additions to a language is also at the core of imperative synchronous programming languages like Este rel, Céu or Blech.

> There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

Sounds interesting. If it's not too much of an effort, could you dig up a reference?

Coroutines is just a way to write continuations in an imperative style and with more overhead.

I never understood the value. Just use lambdas/callbacks.

> Just use lambdas/callbacks

"Just" is doing a lot of work there. I've use callback-based async frameworks in C++ in the past, and it turns into pure hell very fast. Async programming is, basically, state machines all the way down, and doing it explicitly is not nice. And trying to debug the damn thing is a miserable experience

Not necessarily. A coroutine encapsulates the entire state machine, which might pe a PITA to implement otherwise. Say, if I have a stateful network connection, that requires initialization and periodic encryption secret renewal, a coroutine implementation would be much slimmer than that of a state machine with explicit states.

No serious devs even uses Unity coroutines. Terrible control flow and perf. Fine for small projects on PC.

In Haskell this technique has been called ‘reinversion of control’: http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-o...

>The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?

Unity is currently on C# 9 and that IEnumerable trick is no longer needed in new codebases. async is properly supported.

> Just use lambdas/callbacks

> Just use lambdas/callbacks.

Lol, no thanks. People are using coroutines exactly to avoid callback hell. I have rewritten my own C++ ASIO networking code from callback to coroutines (asio::awaitable) and the difference is night and day!

Did you read the article? As the author says, it becomes a state machine hell very quickly beyond very simple examples.

The Unity editor does not let you examine the state hidden in your closures or coroutines. (And the Mono debugger is a steaming pile of shit.)

Just put your state in visible instance variables of your objects, and then you will actually be able to see and even edit what state your program is in. Stop doing things that make debugging difficult and frustratingly opaque.

In all of my years of professional game dev, I can verify that this is not even remotely true. They're used basically everywhere. They're very common when you need something to update for a set period of time but managing the state outside a very local context would just make the code a mess.

Unity's own documentation for changing scenes uses coroutines

Echoing the thoughts of the only current sibling comment: lots of "serious" developers (way to gatekeep here) definitely use coroutines, when they make sense. As mentioned, it's one of the best ways to have something update each frame for a short period of time, then neatly go away when it's not needed anymore. Very often, the tiny performance hit you take is completely outweighed by the maintanability/convenience.

Just out of interest, how many serious unity devs have you talked to?

You can do a lot of horrible things with setjmp and friends. I actually implemented some exception throw/catch macros using them (which did work) for a compiler that didn't support real C++ exceptions. Thank god we never used them in production code.

This would be about 32 years ago - I don't like thinking about that ...

> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.

Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.

(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)

setjmp + longjump + sigaltstack is indeed the old trick.

C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.

Not really. I've done it years ago. The one restriction for code inside the coroutine is that it mustn't catch (...). You solve destruction by distinguishing whether a couroutine is paused in the middle of execution or if it finished running. When the coroutine is about to be destructed you run it one last time and throw a special exception, triggering destruction of all RAII objects, which you catch at the coroutine entry point.

Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap. You can restrict the exception types however you want. I chose to support only subclasses of std::exception and handle anything else as an unknown exception.

> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

I think what they meant is that that what it takes to add coroutines support to a C/++ program. Adding it to, say, Java or C# is much more involved.

Boost has stackful coroutines. They also used to be in posix (makecontext).

ASIO is also available outside of boost! https://github.com/chriskohlhoff/asio

For anyone wondering; this isn't a hack, that's the same library, just as good, just without boost dependencies.

https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...

For several years now, I wonder if it will ever happen.

One annoying piece of Unity's CoreCLR plan is there is no plan to upgrade IL2CPP (Unity's AOT compiler) to use a better garbage collector. It will continue to use Boehm GC, which is so much worse for games.

This would be about 32 years ago - I don't like thinking about that ...

setjmp + longjump + sigaltstack is indeed the old trick.

Boost has stackful coroutines. They also used to be in posix (makecontext).

I think what they meant is that that what it takes to add coroutines support to a C/++ program. Adding it to, say, Java or C# is much more involved.

For several years now, I wonder if it will ever happen.

Not too different from C++'s iterator interface for generators, I guess.

If you need to implement an async state machine, couldn't that just as easily be done with std::future? How do coroutines make this cleaner/better?

You can embed the state in your lambda context, it really isn't as difficult as what people claim.

The author just chose to write it as a state machine, but you don't have to. Write it in whatever style helps you reach correctness.

> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.

Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.

Wouldnt that be stackless (shared stack)

> Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap

This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.

Thanks, that's interesting.

Why wouldn't they use the GC that comes with the dotnet AOT runtime?

Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.

Edit: Nevermind, they eventually bothered.

Unity has async too [1]. It's just that in a rare display of sanity they chose to not deprecate the IEnumerator stuff.

[1] https://docs.unity3d.com/6000.3/Documentation/ScriptReferenc...

It's ancient. The latest version of Unity only partially supports C# 9. We're up to C# 14 now. But that's just the language version. The Mono runtime is only equivalent to .NET Framework 4.8 so all of the standard library improvements since .NET (Core) are missing. Not directly related to age but it's performance is also significantly worse than .NET. And Unity's garbage collector is worse than the default one in Mono.

IIRC generators and co-routines are equivalent in a sense that you can implement one with the other.

> Just use lambdas/callbacks.

The Unity editor does not let you examine the state hidden in your closures or coroutines. (And the Mono debugger is a steaming pile of shit.)

Unity's own documentation for changing scenes uses coroutines

Thanks, that's interesting.

> Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap

This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.

Unity has async too [1]. It's just that in a rare display of sanity they chose to not deprecate the IEnumerator stuff.

[1] https://docs.unity3d.com/6000.3/Documentation/ScriptReferenc...

Generators are a subset of coroutines that only yield data in one direction. Full coroutines can also receive more input from the caller at every yield point.

Did you read the article? As the author says, it becomes a state machine hell very quickly beyond very simple examples.

I just don’t agree that it always becomes a state machine hell. I even did this in C++03 code before lambdas. And honestly, because it was easy to write careless spaghetti code, it required a lot more upfront thought into code organization than just creating lambdas willy-nilly. The resulting code is verbose, but then again C++ itself is a fairly verbose language.

...and then crash when any object it was using gets deleted while it's still running, like when the game changes scenes, but it becomes a manual, error-prone process to track down and stop all the coroutines holding on to references, that costs much more effort than it saves.

I've been a serious Unity developer for 16 years, and I avoid coroutines like the plague, just like other architectural mistakes like stringly typed SendMessage, or UnityScript.

Unity coroutines are a huge pain in the ass, and a lazy undisciplined way to do things that are easy to do without them, using conventional portable programming techniques that make it possible to prevent edge conditions where things fall through the cracks and get forgotten, where references outlive the objects they depend on ("fire-and-forget" gatling foot-guns).

Coroutines are great -- right up until they aren’t.

They give you "nice linear code" by quietly turning control flow into a distributed state machine you no longer control. Then the object gets destroyed, the coroutine keeps running, and now you’re debugging a null ref 200 frames later in a different scene with an obfuscated call stack and no ownership.

"Just stop your coroutines" sounds good until you realize there’s no coherent ownership model. Who owns it? The MonoBehaviour? The caller? The scene? Every object it has a reference to? The thing it captured three yields ago? The cure is so much worse than the disease.

Meanwhile: No static guarantees about lifetime. No structured cancellation. Hidden allocation/GC from yield instructions. Execution split across frames with implicit state you can’t inspect.

Unity has a wonderful editor that lets you inspect and edit the state of the entire world: EXCEPT FOR COROUTINES! If you put your state into an object instead of local variables in a coroutine, you can actually see the state in the editor.

All of this to avoid writing a small explicit state machine or update loop -- Unity ALREADY has Update and FixedUpdate just for that: use those.

Coroutines aren’t "cleaner" -- they just defer the mess until it’s harder to reason about.

If you can't handle state machines, then you're even less equipped to handle coroutines.

Just out of interest, how many serious unity devs have you talked to?

I've talked to some non-serious unity devs, like Peter Molyneux...

https://news.ycombinator.com/item?id=47110605

>1h 48m 06s, with arms spread out like Jesus H Christ on a crucifix: "Because we can dynamically put on ANY surface of the cube ANY image we like. So THAT's how we're going to surprise the world, is by giving clues about what's in the middle later on."

Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Moo!

You can embed the state in your lambda context, it really isn't as difficult as what people claim.

The author just chose to write it as a state machine, but you don't have to. Write it in whatever style helps you reach correctness.

You still need the state and the dispatcher, even if the former is a little more hidden in the implicit closure type.

Wouldnt that be stackless (shared stack)

Correct; stackless. I misspoke.

Why wouldn't they use the GC that comes with the dotnet AOT runtime?

In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.

> that must in turn be backed by a multithreaded event loop to be useful

Why? You can just as well execute all your coroutines on a single thread. Many networking applications are doing fine with just use a single ASIO thread.

Generators are a subset of coroutines that only yield data in one direction. Full coroutines can also receive more input from the caller at every yield point.

As I mentioned on the Reddit thread,

This is quite understandable when you know the history behind how C++ coroutines came to be.

They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.

Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.

Looking at C++ made me understand the point of Rust.

I've talked to some non-serious unity devs, like Peter Molyneux...

https://news.ycombinator.com/item?id=47110605

Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Moo!

Correct; stackless. I misspoke.

You still need the state and the dispatcher, even if the former is a little more hidden in the implicit closure type.

Probably because the AOT runtime doesn't run on game consoles, straight out of the box.

Capcom has their own fork of .NET for the Playstation, for example.

I don't know what kind of GC they implemented.

Oh I totally missed this, thanks! I was overly confident they wouldn't have bothered, given how long it was taking. The last time I used Unity was 2022.3, which was apparently the last version without Awaitable.

> From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful

Multithreaded? Nope. You can do C++ coroutines just fine in a single-threaded context.

Event loop? Only if you're wanting to do IO in your coroutines and not block other coroutines while waiting for that IO to finish.

> most people end up using coroutines with something like boost::asio

Sure. But you don't have to. Asio is available without the kitchen sink: https://think-async.com/Asio/

Coroutines are actually really approachable. You don't need boost::asio, but it certainly makes it a lot easier.

I recommend watching Daniela Engert's 2022 presentation, Contemporary C++ in Action: https://www.youtube.com/watch?v=yUIFdL3D0Vk

I use asio at work for coroutine. It's one of the most opaque library I've ever used. The doc is awful and impenetrable.

The most helpful resource about it is a guy on stackoverflow (sehe). No idea how to get help once SO will have closed

Coroutines generally imply some sort of magic to me.

I would just go straight to tbb and concurrent_unordered_map!

The challenge of parallelism does not come from how to make things parallel, but how you share memory:

My challenge right now is how do I make the JVM fast on native memory:

1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.

We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?

> C# (already has it but is terrible to write native/VM code for?)

What do you mean here? Do you mean hand-writing MSIL or native interop (pinvoke) or something else?

> some sort of magic to me.

Your stack is on the heap and it contains an instruction pointer to jump to for resume.

I've been a serious Unity developer for 16 years, and I avoid coroutines like the plague, just like other architectural mistakes like stringly typed SendMessage, or UnityScript.

Coroutines are great -- right up until they aren’t.

Meanwhile: No static guarantees about lifetime. No structured cancellation. Hidden allocation/GC from yield instructions. Execution split across frames with implicit state you can’t inspect.

All of this to avoid writing a small explicit state machine or update loop -- Unity ALREADY has Update and FixedUpdate just for that: use those.

Coroutines aren’t "cleaner" -- they just defer the mess until it’s harder to reason about.

If you can't handle state machines, then you're even less equipped to handle coroutines.

Never had a crash from that. When the GameObject is destroyed, the coroutine is gone. If you're using a coroutine to manage something outside the scope of the GameObject yourself, that's a problem with your own design, not the coroutine itself.

It'd be like complaining about arrays being bad because if you pass a pointer to another object, nuke the original array, then try to access the data, it'll cause an error. That's kind of... your own fault? Got to manage your data better.

I dunno, I've worked on some pretty big projects that have used lots of coroutines, and it's pretty easy to avoid all of the footguns.

I'm not advocating for the ubiquitous use of coroutines (there's a time and place), but they're like anything else: if you don't know what you're doing, you'll misuse them and cause problems. If you RTFM and understand how they work, you won't have any issues.

> Who owns it? The MonoBehaviour? The caller? The thing it captured three yields ago?

The monobehavior that invoked the routine owns it and is capable of cancelling it at typical lifecycle boundaries.

This is not a hill I would die on. There's a lot of other battles to fight when shipping a game.

> Who owns it? The MonoBehaviour? The caller? The thing it captured three yields ago?

The monobehavior that invoked the routine owns it and is capable of cancelling it at typical lifecycle boundaries.

This is not a hill I would die on. There's a lot of other battles to fight when shipping a game.

And then you're bending over backwards and have made so much more busy work for yourself than you would have if you'd just done it the normal way, in which all your state would be explicitly visible and auditable in the editor.

The biggest reason for using Unity is its editor. Don't do things that make the editor useless, and are invisible to it.

The problem with coroutines is that they generate invisible errors you end up shipping and fighting long after you shipped your game, because they're so hard to track down and reproduce and diagnose.

Sure you can push out fixes and updates on Steam, but how about shipping games that don't crash mysteriously and unpredictably in the first place?

I dunno, I've worked on some pretty big projects that have used lots of coroutines, and it's pretty easy to avoid all of the footguns.

They're a crutch for people who don't know what they're doing, so of course they invite a whole host of problems that are harder to solve than doing it right in the first place.

If you strictly require people to know exactly what they're doing and always RTFM and perfectly understand how everything works, then they already know well enough to avoid coroutines and SendMessage and UnityEvents and other footguns in the first place.

It's much easier and more efficient to avoid all of the footguns when you simply don't use any of the footguns.

Probably because the AOT runtime doesn't run on game consoles, straight out of the box.

Capcom has their own fork of .NET for the Playstation, for example.

I don't know what kind of GC they implemented.

They're a crutch for people who don't know what they're doing, so of course they invite a whole host of problems that are harder to solve than doing it right in the first place.

It's much easier and more efficient to avoid all of the footguns when you simply don't use any of the footguns.

The biggest reason for using Unity is its editor. Don't do things that make the editor useless, and are invisible to it.

The problem with coroutines is that they generate invisible errors you end up shipping and fighting long after you shipped your game, because they're so hard to track down and reproduce and diagnose.

Sure you can push out fixes and updates on Steam, but how about shipping games that don't crash mysteriously and unpredictably in the first place?

This timing additions to a language is also at the core of imperative synchronous programming languages like Este rel, Céu or Blech.

> some sort of magic to me.

Your stack is on the heap and it contains an instruction pointer to jump to for resume.

For anyone wondering; this isn't a hack, that's the same library, just as good, just without boost dependencies.

Thanks for pointing this out! This may not obvious not everybody.

Also, this is not some random GitHub Repo, Chris Kohlhoff is the developer of ASIO :)

This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.

I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.

> I’m not aware of any general purpose programming language that successfully models time explicitly

Step 1, solve "time" for general computing.

The difficulty here is that our periods are local out of both necessity and desire; we don't fail to model time as a first class concept, we bring time-as-first-class with us and then attempt to merge our perspectives with varying degrees of success.

We're trying to rectify the observations of Zeno, a professional turtle hunter, and a track coach with a stopwatch when each one has their own functional definition of time driven by intent.

Sounds interesting. If it's not too much of an effort, could you dig up a reference?

You're in luck - it's the first talk at this link, "The Polling Problem": https://www.gdcvault.com/play/1018040/Architecture-Tricks-Ma...

Mind you my memory may have distorted it a little beyond what it was, but it's loosely on the topic!

> C# (already has it but is terrible to write native/VM code for?)

What do you mean here? Do you mean hand-writing MSIL or native interop (pinvoke) or something else?

I use asio at work for coroutine. It's one of the most opaque library I've ever used. The doc is awful and impenetrable.

The most helpful resource about it is a guy on stackoverflow (sehe). No idea how to get help once SO will have closed

No I meant this but for C# is a whole lot more complex:

http://move.rupy.se/file/jvm.txt

Ask Claude Code to write a manual for it.

> I’m not aware of any general purpose programming language that successfully models time explicitly

Step 1, solve "time" for general computing.

We're trying to rectify the observations of Zeno, a professional turtle hunter, and a track coach with a stopwatch when each one has their own functional definition of time driven by intent.

Thanks for pointing this out! This may not obvious not everybody.

Also, this is not some random GitHub Repo, Chris Kohlhoff is the developer of ASIO :)

You're in luck - it's the first talk at this link, "The Polling Problem": https://www.gdcvault.com/play/1018040/Architecture-Tricks-Ma...

Mind you my memory may have distorted it a little beyond what it was, but it's loosely on the topic!

Ask Claude Code to write a manual for it.

No I meant this but for C# is a whole lot more complex:

http://move.rupy.se/file/jvm.txt

20 Mar 2026 on C++, Game development

I had seen many talks about coroutines but it never really clicked where I could use them outisde of async IO. Until I looked at how Unity uses them in C#.

Coroutines have been around in C++ for 6 years now. And still I have yet to encounter any in production code. This is possibly due to the fact that they are by themselves a quite low-level feature. Or more precisely, they’re a high level feature that requires a bunch of complex (and bespoke) low-level code to plug into a project. But I suspect another, even bigger, issue with the coroutines rollout in C++ has been the lack of concrete examples. After all, how often do you need to compute Fibonacci in real life?

Recently, I have been looking at Unity, which mostly uses C# for client gameplay code (you can do C++ but it’s uncommon). And more specifically, I ran across their usage of coroutines for spawning effects and other ephemeral behaviours. Here’s an example from the manual I’ll reproduce here for the purpose of illustrating this article:

void Update()
{
    if (Input.GetKeyDown("f"))
    {
        StartCoroutine(Fade());
    }
}

IEnumerator Fade()
{
    Color c = renderer.material.color;
    for (float alpha = 1f; alpha >= 0; alpha -= 0.1f)
    {
        c.a = alpha;
        renderer.material.color = c;
        yield return null;
    }
}

C# and/or coroutines purists might take offense at this usage of yield. After all the semantics are all wrong here. We’re yielding nothing where we’re trying to express something akin to await NextFrame(). From what I could read this is an artifact inherited from a lack of await support when they were initially added to C# (they only supported generator style yield), which led Unity to use this hack which is still around today. I am not only mentioning it as a random piece of historical trivia, this will become relevant later.

Why coroutines?

This example is still a bit basic and might not make it immediately apparent why we would prefer to write our effects this way. After all, this could be made into a simple lambda with a mutable alpha variable that we would nudge each call. But let’s try with a slightly more complex effect:

IEnumerator TimeWarp()
{
    // It's just a jump to the left
    transform.position.x -= 1.f;
    yield return null;

    // Then a step to the right
    for (int i = 0; i < 4; ++i)
    {
        transform.position.x += 0.2f;
        yield return null;
    }

    // Put your hands on your hips
    // ...

    // Let's do the time warp again!
    for (int i = 0; i < 4; ++i)
    {
        transform.Rotate(0.f, 90.f * i, 0.f);
        yield return null;
    }
}

Now it would become actually painful to turn this into a regular functor or lambda. Writing it in C++ turns it into some sort of ugly state machine like this:

class TimeWarp
{
    enum class State
    {
        Jump,
        StepRight,
        HandsOnHips,
        // ...
        DoAgain
    };

    State _state = State::Jump;
    int _i = 0;
    Transform* _transform;

    TimeWarp(Transform& transform) : _transform(&transform) {}

    bool operator()()
    {
        switch ( _state )
        {
            case State::Jump:
                _transform->position.x -= 1.f;
                _state = State::StepRight;
                break;

            case State::StepRight:
                _transform->position.x += 0.2f;
                if ( ++_i == 4 )
                {
                    _state = State::HandsOnHips;
                    _i = 0;
                }
                break;

            // ...

            case State::DoAgain:
                _transform->Rotate(0.f, 90.f * i, 0.f);
                if ( ++_i == 4 )
                {
                    // Indicate we're done
                    return true;
                }
                break;
        }
        return false;
    }
}

Pretty ugly, isn’t it? Would you let it pass code review? What else would you suggest instead?

I guess I would perhaps recommend the author split TimeWarp into its component moves and handle state transitions by queueing the next effect as a continuation. But I probably wouldn’t be happy about it.

This, to me, is the kind of no-brainer case I’ve been dying to see to be sold on the value of coroutines. Wrapping one loop might not be worth the hassle of figuring out how to integrate coroutines in your codebase, but wrapping a sequence of operations with state definitely does. It’s all about turning a hard to read state machine into a very simple function.

A C++23 implementation

So, let’s do the time warp again in C++ then.

std::generator<std::monostate> TimeWarp(GameObject& obj)
{
    // It's just a jump to the left
    obj.transform.position.x -= 1.f;
    co_yield {};

    // Then a step to the right
    for (int i = 0; i < 4; ++i)
    {
        obj.transform.position.x += 0.2f;
        co_yield {};
    }

    // Put your hands on your hips
    // ...

    // Let's do the time warp again!
    for (int i = 0; i < 4; ++i)
    {
        obj.transform.Rotate(0.f, 90.f * i, 0.f);
        co_yield {};
    }
}

My readers may object that this is a hack. In fact, this is the same hack as Unity did back a decade and some change ago. And that’s precisely the point. For the exact same reasons.

See, the real reason we mostly see Fibonacci generators in slides is because using co_yield is (relatively) easy, especially since C++23 gave us <generator>. But making use of co_await is hard. Yielding from a coroutine is fairly straightforward and generic. The control flow is simple, we suspend and return to the caller and they decide when we will be awaken next. On the other hand handling co_await requires answering a lot of questions that don’t have an obvious answer. What are we going to wait on? How will they signal that they are ready to resume? Can we use signals/interrupts instead of polling? Who will check that they are ready to run again? Will they also awaken (run) the coroutine, or will they put them back in an execution queue? Which execution queue? A background thread? A thread pool? Using what implementation? The list goes on.

To misquote Kennedy, “we chose to focus coroutines on generator in C++23, not because it is hard, but because it is easy”.

C++26 should implement execution and give us a framework to be able to use co_await, but I expect it to be an uphill battle. After all, most projects should already have their own concurrency solution and given how little is in the standard besides low level constructs, it means a lot of divergence that will need to be plugged back into the execution model. I expect most projects have their own custom schedulers, thread pools and the like. Or use something like TBB to get one.

Perhaps your codebase already uses boost::asio in which case you already have support for coroutines. If not, you will either need to wait for C++26 and switch/integrate with execution, or implement your own promises and awaitables to fit your threading model.

Or you could use the Unity hack.

Unity-like coroutines runner in C++

It took me less than an hour to implement a simple Unity style coroutine executor in my toy game main thread. Here’s the whole thing:

class effects_manager
{
public:
    void add( std::generator<std::monostate> effect )
    {
        _effects.push_back( std::move( effect ) );
        _iterators.push_back( _effects.back().begin() );
    }

    void run()
    {
        // Remove the ones that are done
        // (tweaked https://en.cppreference.com/w/cpp/algorithm/remove.html#Version_3)
        int first = 0;
        for ( ; first != _effects.size()
                 && _iterators[ first ] != _effects[ first ].end(); ++first );

        if ( first != _effects.size() )
        {
            for ( int i = first; ++i != _effects.size(); )
            {
                if ( _iterators[ i ] != _effects[ i ].end() )
                {
                    _effects[ first ] = std::move( _effects[ i ] );
                    _iterators[ first ] = std::move( _iterators[ i ] );
                    ++first;
                }
            }
            _effects.erase( begin( _effects ) + first, end( _effects ) );
            _iterators.erase( begin( _iterators ) + first, end( _iterators ) );
        }

        // Run the effects
        for ( int i = 0; i < _effects.size(); ++i )
        {
            ++_iterators[ i ];
        }
    }

private:
    std::vector<std::generator<std::monostate>> _effects;
    using effect_iterator = decltype( std::declval<std::generator<std::monostate>>().begin() );
    std::vector<effect_iterator> _iterators;
};

That’s it. The only hard part is the loop that removes the coroutines that have reached the end of their execution by hand-writing a std::remove_if variant that works with 2 zipped arrays. If you already have a utility for it, the whole thing will take less than 20 lines.

Now can fire effects by writing something like effects.add(TimeWarp(object)) and we just need to remember to call effects.run() in our main loop.

Doing it the “proper” way would require to write a custom next-frame awaiter that inserts our coroutine handle into a next frame queue. While that’s doable, this requires a more in-depth understanding of coroutines internals to implement. And, to be honest, I kind of like the yield approach to mean “yield control until next frame”.

Bonus benefit

As I was writing this, I also realized, it wouldn’t take much to turn our current implementation into a proper generator rather than relying on our coroutine invoking side effects. Instead of monostate we could return a renderable object.

std::generator<Draw> TimeWarp(const Model& model)
{
    // It's just a jump to the left
    vec3 position{ -1.f, 0.f, 0.f };
    co_yield Draw{ .model = model, .transform{ .position = position } };

    // Then a step to the right
    for (int i = 0; i < 4; ++i)
    {
        position.x += 0.2f;
        co_yield Draw{ .model = model, .transform{ .position = position } };
    }

    // Put your hands on your hips
    // ...

    // Let's do the time warp again!
    for (int i = 0; i < 4; ++i)
    {
        obj.transform.Rotate(0.f, 90.f * i, 0.f);
        co_yield Draw{ .model = model,
                       .transform{ .position = position,
                                   .rotation = Rotate(0.f, 90.f * i, 0.f) } };
    }
}

Now we change our run() method to populate a vector of draws:

std::vector<Draw> run()
{
    // Remove the ones that are done ()
    // ...

    // Run the effects
    std::vector<Draw> draws;
    draws.reserve( _effects.size() );
    for ( int i = 0; i < _effects.size(); ++i )
    {
        draws.push_back( *_iterators[ i ] );
        ++_iterators[ i ];
    }
    return draws;
}

And while we’re at it, we could even make our loop run in parallel now since we removed the side effects:

// Run the effects
std::vector<Draw> draws( _effects.size() );
tbb::parallel_for( 0zu, _effects.size(), [this, &draws]( size_t i )
                   {
                       draws[ i ] = *_iterators[ i ];
                       ++_iterators[ i ];
                   } );
return draws;

There. A simple and relatively efficient effect system for our game that allows designers to implement all sorts of bespoke funky things as easy to read coroutines, and the entire system took us less than a hundred lines to write.

Now, wouldn’t you say this looks much more interesting to have than if I had shown you yet another Fibonacci generator?

Hacker Times