The gold standard of optimization: A look under the hood of RollerCoaster Tycoon

Warcraft 1 (1994), Warcraft 2 (1995), and StarCraft (1998) all use power-of-2 aligned map sizes (64 blocks, 128 blocks, and 256 blocks) so the shift-factor could be pre-computed to avoid division/multiplication, which was dang slow on those old 386/486 computers.

Each map block was 2x2 cells, and each cell, 8x8 pixels. Made rendering background cells and fog-of-war overlays very straightforward assembly language.

All of Warcraft/etc. had only a few thousand lines of assembly language to render maps/sprites/fonts/fog-of-war into the offscreen buffer, and to blit from the offscreen buffer to the screen.

The rest of the code didn't need to be in assembly, which is too time-consuming to write for code where the performance doesn't matter. Everything else was written in portable assembler, by which I mean C.

Edit:

By way of comparison, Blackthorne for Super Nintendo was all 85816 assembly. The Genesis version (Motorola 68000) and DOS version (Intel 80386) were manually transcribed into their respective assembly languages.

The PC version of Blackthorne also had a lot of custom assembler macros to generate 100K of rendering code to do pixel-scrollable chunky-planar VGA mode X (written by Bryan Waters - https://www.mobygames.com/person/5641/bryan-waters/).

At Blizzard we learned from working on those console app ports that writing assembly code takes too much programmer time.

Edit 2:

I recall that Comanche: Maximum Overkill (1992, a voxel-based helicopter simulator) was written in all assembly in DOS real mode. A huge technical feat, but so much work to port to protected mode that I think they switched to polygon-rendering for later versions.

> The same trick can also be used for the other direction to save a division:

> NewValue = OldValue >> 3; > This is basically the same as

> NewValue = OldValue / 8;

> RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.

The author loses a lot of credibility by suggesting the compiler won't replace multiplying or dividing by a factor of 2 with the equivalent bit shift. That's a trivial optimization that's always been done. I'm sure compilers were doing that in the 70s.

> Imagine a programmer asking a game designer if they could change their formula to use an 8 instead of a 9.5 because it is a number that the CPU prefers to calculate with. There is a very good argument to be made that a game designer should never have to worry about the runtime performance characteristics of binary arithmetic in their life, that’s a fate reserved for programmers

Numeric characteristics are absolutely still a consideration for game designers even in 2026, one that influences what numbers they use in their game designs. The good ones, anyways. There are, of course, also countless bad developers/designers who ignore these things these days, but not because it is free to do so; rather, because they don't know better, and in many cases it is one of many silent contributing factors to a noticeable decrease in the quality of their game.

This is a fun read, its one of my favorite games growing up by far, countless hours sunk into it. I didn't need this write-up to know that Chris Sawyer was god among men and that the open source version is a huge labor of love, but its a good reminder :) I will need to give OpenRCT a try some time, I've tried a little OpenTTD and really enjoy it, but RCT was always my jam.

For the lesson here, I think re-contextualizing the product design in order to ease development should be a core tenant of modern software engineering (or really any form of engineering). This is why we are usually saying that we need to shift left on problems, discussing the constraints up-front lets us inform designers how we might be able to tweak a few designs early in order to save big time on the calendar. All of the projects that I loved being a part of in my career did this well, all of the slogs were ones that employed a leadership-driven approach that amounted to waterfall.

> The same trick can also be used for the other direction to save a division: NewValue = OldValue >> 3; This is basically the same as NewValue = OldValue / 8; RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.

(emphasis mine)

Not at all true. Assuming the types are such that >> is equivalent to /, modern compilers will implement division by a power of two as a shift every single time.

Fun read, thx! I'd also recommend more about RCT:

"Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)" https://news.ycombinator.com/item?id=46130335

"Rollercoaster Tycoon (Or, MicroProse's Last Hurrah)" https://news.ycombinator.com/item?id=44758842

"RollerCoaster Tycoon at 25: 'It's mind-blowing how it inspired me'" https://news.ycombinator.com/item?id=39792034

"RollerCoaster Tycoon was the last of its kind [video]" https://news.ycombinator.com/item?id=42346463

"The Story of RollerCoaster Tycoon" https://www.youtube.com/watch?v=ts4BD8AqD9g

Definitely made me feel old to see bit-shifting needed an explainer! I must admit as I was reading I was "why is he explaining this? it's obvious!".

"Since the number is stored in a binary system, every shift to the left means the number is doubled.

At first this sounds like a strange technical obscurity"

Do we not know binary in 2026? Why is this a surprise to the intended audience?

What language is this article talking where compilers don't optimize multiplication and division by powers of two? Even for division of signed integers, current compilers emit inline code that handles positive and negative values separately, still avoiding the division instruction (unless when optimizing for size, of course).

> When reading through OpenRCT2’s source, there is a common syntax that you rarely see in modern code, lines like this:

> NewValue = OldValue << 2;

I disagree with the framing of this section. Bit shifts are used all the time in low-level code. They're not just some archaic optimisation, they're also a natural way of working with binary data (aka all data on a computer). Modern low-level code continues to use lots of bit shifts, bitwise operators, etc.

Low-level programming is absolutely crucial to performant games. Even if you're not doing low-level programming yourself, you're almost certainly using an engine or library that uses it extensively. I'm surprised an article about optimisation in gaming, of all things, would take the somewhat tired "in ye olde days" angle on low-level code.

Great write up. Thank you. Really great!

I was reminded of the factorio blog. That game's such a huge optimization challenge even by today's standards and I believe works with the design.

One interesting thing I remember is if you have a long conveyor belt of 10,000 copper coils, you can basically simplify it to just be only the entry and exit tile are actually active. All the others don't actually have to move because nothing changes... As long as the belts are fully or uniformly saturated. So you avoid mechanics which would stop that.

The pathfinding section reminded me that there's a YouTube steamer, Marcel Vos, who goes into a deep dive of how the pathfinding works.

https://youtu.be/twU1SsFP-bE

He has lots of videos that are deep dives into how RCT works and how things are implemented!

I had always heard about how RCT was built in Assembly, and thought it was very impressive.

The more I actually started digging into assembly, the more this task seems monumental and impossible.

I didn't know there was a fork and I'm excited to look into it

On huge games produced by large game studios, I wonder if the idea of using a real world technical challenge as a "feature" within the game is considered genius? Consider a coder and a game designer who are on different teams and don't attend the same meetings.

But if you look at creative writing, story arcs are all about obstacles. A boring story is made interesting by an obstacle. It is what our protagonist needs to overcome. A one-man-band game dev who simultaneously holds the story and the technical challenge their head, might spot the opportunity to use a glitch or limitation as, I dunno, a mini game that riffs on the glitch.

also I remember the excitement of a new game that looked different to others.

Somehow even as a child I just knew that it would be a whole new emergent game play experience.

Ofcourse I didnt know waht went into making Rolelrcoaster Tycoon but I could just by a couple of screenshots how this was clearly a ground up new game with new mechanics that would be extremely fun to play.

I dont get this feeling anymore, as I just assyne everything is just a clone of another game in the same engine generally.

Unless its been a decade in production like Breath of the Wild of GTA 5 i just dont expect much.

Biggest lesson from string matching work: layout beats instructions every time. Batch your comparisons into a contiguous buffer so the prefetcher can actually help, and you'll outperform hand-rolled SIMD on random-access data. Compilers handle the arithmetic tricks fine now — they can't fix your cache misses though.

Is there a place to find stories of recent game optimization? What's most ridiculous on like quick inverse square route. As someone who spent way too much time vraying in prior life, I still can't believe we got real time ray tracing.

The article refers several times to the benefits of the game designer and the coder being the same person. I've often felt that this is the only way to build anything impressive, and in fact I'm amazed that corporations with their hierarchical organisation model ever get anything built at all but I suppose you can brute force anything with enough employees.

It does make you wonder if the future of AI-assisted development will look more like the early days of coding, where one single mind can build and deliver a whole piece of software from beginning to end.

Compilers won't do multiplication by power of two to bit shift for you ? I remember reading in ~2000: the only thing writing a<<2 instead of a/4 will do is make your compiler yawn

> This part is especially fascinating to me, since it turns an optimization done out of technical necessity into a gameplay feature.

Reminds me of blood moons in Zelda https://www.polygon.com/legend-zelda-tears-kingdom/23834440/...

I have to wonder how much of the original assembly source looked a lot more succinct than whatever's in OpenRCT due to the use of macros. Looking up his gameography on Mobygames, Chris had been writing stuff since 1984 when RCT came out in 1999, it's hard to imagine he was still writing every single opcode out by hand given that I had some macros in the assembler I was fooling around with on my c64 back in the eighties.

So this is what programing on hard mode looks like ?

While it has been a while since playing RCT, one thing that was really nice about the game is that it runs flawlessly under Wine.

I really wish I could see the source code.

> The same trick can also be used for the other direction to save a division:

> NewValue = OldValue >> 3;

You need to be careful, because this doesn't work if the value is negative. A

> it turns an optimization done out of technical necessity into a gameplay feature

And this folks is why an optimizing compiler can never beat sufficient quantities of human optimization.

The human can decide when the abstraction layers should be deliberately broken for performance reasons. A compiler cannot do that.

Fantastic write-up, that's exactly why I came to HN many years ago, to find such articles about mundane things or products, but the technical aspect is just fascinating.

I’m quite surprised at this comment.

Another great optimization is storing the year as two digits, because you only need the back half…

… oh wait, nvm. Don’t preoptimize!

The pathfinder algorithm is a great example of why constraints are so important for creativity and creative development.

If AI has any benefit to creative endeavors at all it will be because of the challenges of coaxing a machine defined to produce an averaging of a large corpus of work (producing inherently mediocre slop) provides novel limitations, not because it makes art any more "accessible".

so I don't thin the rust people are gonna be happy with non memory safe assembly

(emphasis mine)

Not at all true. Assuming the types are such that >> is equivalent to /, modern compilers will implement division by a power of two as a shift every single time.

Fun read, thx! I'd also recommend more about RCT:

"Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)" https://news.ycombinator.com/item?id=46130335

"Rollercoaster Tycoon (Or, MicroProse's Last Hurrah)" https://news.ycombinator.com/item?id=44758842

"RollerCoaster Tycoon at 25: 'It's mind-blowing how it inspired me'" https://news.ycombinator.com/item?id=39792034

"RollerCoaster Tycoon was the last of its kind [video]" https://news.ycombinator.com/item?id=42346463

"The Story of RollerCoaster Tycoon" https://www.youtube.com/watch?v=ts4BD8AqD9g

The pathfinding section reminded me that there's a YouTube steamer, Marcel Vos, who goes into a deep dive of how the pathfinding works.

https://youtu.be/twU1SsFP-bE

He has lots of videos that are deep dives into how RCT works and how things are implemented!

While it has been a while since playing RCT, one thing that was really nice about the game is that it runs flawlessly under Wine.

I really wish I could see the source code.

Each map block was 2x2 cells, and each cell, 8x8 pixels. Made rendering background cells and fog-of-war overlays very straightforward assembly language.

All of Warcraft/etc. had only a few thousand lines of assembly language to render maps/sprites/fonts/fog-of-war into the offscreen buffer, and to blit from the offscreen buffer to the screen.

Edit:

At Blizzard we learned from working on those console app ports that writing assembly code takes too much programmer time.

Edit 2:

It's a shame that when a Redditor discovered the source code for the original StarCraft "gold master" on a CD, they sent it back to Blizzard in exchange for some fucking blizzard merch [1]

EA a while back released the source code to (most) of the old Command & Conquer games [2] though interestingly left out Tiberian Sun and Red Alert 2, StarCraft's closest competitors at the time.

Would've been nice for historical preservation to be able to peek behind the curtain and see StarCraft's code in a similar fashion

[1] https://old.reddit.com/r/gamecollecting/comments/68xzxt/star...

[2] https://github.com/electronicarts

Were you at blizzard when they lost their source code server and had no backups? I was there for a short time consulting around the time WC3 was released.

If you worked on Lost Vikings I'd like to thank you for the entertainment during my childhood. Given your background did you ever get involved in the demo scene?

Maximum overkill was an amazing game. I probably played hundreds and hundreds of hours.

> Numeric characteristics are absolutely still a consideration for game designers even in 2026, one that influences what numbers they use in their game designs. The good ones, anyways.

I used to think like this, not anymore.

What convinced me that these sort of micro-optimizations just don't matter is reading up on the cycle count of modern processors.

One a Zen 5, Integer addition is a single cycle, multiplication 3, and division ~12. But that's not the full story. The CPU can have 5 inflight multiplications running simultaneously. It can have about 3 divisions running simultaneously.

Back in the day of RCT, there was much less pipelining. For the original pentium, a multiplication took 11 cycles, division could take upwards of 46 cycles. These were on CPUs with 100 Mhz clock cycles. So not only did it take more cycles to finish, couldn't be pipelined, the CPUs were also operating at 1/30th to 1/50th the cycle rate of common CPUs today.

And this isn't even touching on SIMD instructions.

Integer tricks and optimizations are pointless. Far more important than those in a modern game is memory layout. That's where the CPU is actually going to be burning most it's time. If you can create and do operations on a int[], you'll be MUCH faster than if you are doing operations against a Monster[]. A cache miss is going to mean anywhere from a 100 to 1000 cycle penalty. That blows out any sort of hit you take cutting your cycles from 3 to 1.

Yeah, I’m quite surprised at this comment. Commercial video games are mass-produced products, and as much as I dislike designers being bogged down in technical minutiae, having a sense of industrial design for the thing you’re making is an incredible boon.

Fumito Ueda was notably quite concerned with the technical/production feasibility of his designs for Shadow of the Colossus. [1] Doom was an exercise in both creativity and expertise.

[1] https://www.designroom.site/shadow-of-the-colossus-oral-hist...

Absolutely. I have written a small but growing CAD kernel which is seeing use in some games and realtime visualization tools ( https://github.com/timschmidt/csgrs ) and can say that computing with numbers isn't really even a solved problem yet.

All possible numerical representations come with inherent trade-offs around speed, accuracy, storage size, complexity, and even the kinds of questions one can ask (it's often not meaningful to ask if two floats equal each other without an epsilon to account for floating point error, for instance).

"Toward an API for the Real Numbers" ( https://dl.acm.org/doi/epdf/10.1145/3385412.3386037 ) is one of the better papers I've found detailing a sort of staged complexity technique for dealing with this, in which most calculations are fast and always return (arbitrary precision calculations can sometimes go on forever or until memory runs out), but one can still ask for more precise answers which require more compute if required. But there are also other options entirely like interval arithmetic, symbolic algebra engines, etc.

One must understand the trade-offs else be bitten by them.

Related to that, for a consumer electronics product I worked on using an ARM Cortex-M4 series microcontroller, I actually ended up writing a custom pseudorandom number generation routine (well, modifying one off the shelf). I was able to take the magic mixing constants and change them to things that could be loaded as single immediates using the crazy Thumb-2 immediate instructions. It passed every randomness test I could throw at it.

By not having to pull in anything from the constant pools and thereby avoid memory stalls in the fast path, we got to use random numbers profligately and still run quickly and efficiently, and get to sleep quickly and efficiently. It was a fun little piece of engineering. I'm not sure how much it mattered, but I enjoyed writing it. (I think I did most of it after hours either way.)

Alas, I don't think it ever shipped because we eventually moved to an even smaller and cheaper Cortex-M0 processor which lacked those instructions. Also my successor on that project threw most of it out and rewrote it, for reasons both good and bad.

"and in many cases it is one of many silent contributing factors to a noticeable decrease in the quality of their game"

Game designers are not so constrained anymore by the limits of the hardware, unless they want to push boundaries. Quality of a game is not just the most efficient runtime performance - it is mainly a question if the game is fun to play. Do the mechanics work. Are there severe bugs. Is the story consistent and the characters relatable. Is something breaking immersion. So ... frequent stuttering because of bad programming is definitely a sign of low quality - but if it runs smooth on the targets audience hardware, improvements should be rather done elsewhere.

That makes no sense since multiplication has been fast for the last 30 years (since PS1) and floating point for the last 25 years (since PS2) and anyway numbers relevant for game design are usually used just a few times per frame so only program size matters, which has not been significantly constrained for the last 40 years (since NES)

I remember the older driving games. They'd progressively "build" the road as you progressed on it. Curves in the road were drawn as straight line segments.

Which wasn't a problem, but it clearly showed how the programmers improvised to make it perform.

Now that's what being a full stack programmer really means.

That's what I would have thought as well, but looks like that on x86, both clang and gcc use variations of LEA. But if they're doing it this way, I'm pretty sure it must be faster, because even if you change the ×4 for a <<2, it will still generate a LEA.

https://godbolt.org/z/EKj58dx9T

It was written in assembly so goes through an assembler instead of a compiler.

> When reading through OpenRCT2’s source, there is a common syntax that you rarely see in modern code, lines like this:

> NewValue = OldValue << 2;

I learned these low-level bit tricks by reading TempleOS' HolyC source code. I remember feeling like a genius when I worked out what this line does:

dc->color=c++&15;

Hint: it's from this "Lines" demo program, whose source is here: https://web.archive.org/web/20180906060723/https://templeos....

And this is what it looks like when it runs (ignore the fact it's running in Minecraft): https://youtu.be/pAN_Fza6Vy8?t=38

Great write up. Thank you. Really great!

I was reminded of the factorio blog. That game's such a huge optimization challenge even by today's standards and I believe works with the design.

I was pretty disappointed with how Factorio reworked how fluids worked in the expansion. The old system had its quirks and the new system is obviously more performant, but it throws realism out the window which is a bummer.

I had always heard about how RCT was built in Assembly, and thought it was very impressive.

The more I actually started digging into assembly, the more this task seems monumental and impossible.

I didn't know there was a fork and I'm excited to look into it

> The same trick can also be used for the other direction to save a division:

> NewValue = OldValue >> 3;

You need to be careful, because this doesn't work if the value is negative. A

Most CPU's has signed and unsigned right shift instructions (left shift is the same), so yes it works (You can test this in C by casting a signed to unsigned before shifting).

The biggest caveat is that right shifting -1 still produces -1 instead of 0, but that's usually fine for much older game fixed-point maths since -1 is close enough to 0.

It works fine when the value is negative.

However, there is a quirk of the hardware of most CPUs that has been inherited by the C language and by other languages.

There are multiple ways of defining integer division when the dividend is not a multiple of the divisor, depending on the rounding rule used for the quotient.

The 2 most frequently used definitions is to have a positive remainder, which corresponds to rounding the quotient by using the floor function, and to have a remainder of the same sign with the quotient, which corresponds to rounding the quotient by truncation.

In most CPUs, the hardware is designed such that for signed integers the division instruction uses the second definition, while the right shift uses the first definition.

This means that when the dividend is a multiple of the divisor, division and right shift are the same, but otherwise the quotient may differ by one unit due to different rounding rules.

Because of this, compilers will not replace automatically divisions with right shifts, because there are operands where the result is different.

Nevertheless, the programmer can always replace a division by a power of two with a right shift. In all the programs that I have ever seen, either the rounding rule for the quotient does not matter or the desired definition for the division is the one with positive remainder, i.e. the definition implemented by right shift.

In those cases when the rounding rule matters, the worrisome case is when you must use division not when you can use right shift, so you must correct the result to correspond to rounding by floor, instead of the rounding by truncation provided by the hardware. For this, you must not use the "/" operator of the C language, but one of the "div" functions from "stdlib.h", or you may use "/" but divide the absolute values of the operands, after which you compute the correct signed results.

> it turns an optimization done out of technical necessity into a gameplay feature

And this folks is why an optimizing compiler can never beat sufficient quantities of human optimization.

The human can decide when the abstraction layers should be deliberately broken for performance reasons. A compiler cannot do that.

The LEA-vs-shift thread here kind of proves the point. Compilers are insanely good at that stuff now. Where they completely fall short is data layout. I had a message parser using `std::map<int, std::string>` for field lookup and the fix was just... a flat array indexed by tag number. No compiler is ever going to suggest that. Same deal with allocation. I spent a while messing with SIMD scanning and consteval tricks chasing latency, and the single biggest win turned out to be boring. Switched from per-message heap allocs to a pre-allocated buffer with `std::span` views into the original data. ~12 allocations per message down to zero. Compiler will optimize the hell out of your allocator code, it just won't tell you to stop calling it.

Agreed. It really requires an understanding of not just the software and computer it's running on, but the goal the combined system was meant to accomplish. Maybe some of us are starting to feed that sort of information into LLMs as part of spec-driven development, and maybe an LLM of tomorrow will be capable of noticing and exploiting such optimizations.

End-to-end optimization in action! Although I'd've liked more than 1 example (pathfinding) here.

Another great optimization is storing the year as two digits, because you only need the back half…

… oh wait, nvm. Don’t preoptimize!

There's a vast space between premature optimization and not caring about optimization until it bites you, and both extremes make you (or someone else) miserable.

so I don't thin the rust people are gonna be happy with non memory safe assembly

It's a shame that when a Redditor discovered the source code for the original StarCraft "gold master" on a CD, they sent it back to Blizzard in exchange for some fucking blizzard merch [1]

EA a while back released the source code to (most) of the old Command & Conquer games [2] though interestingly left out Tiberian Sun and Red Alert 2, StarCraft's closest competitors at the time.

Would've been nice for historical preservation to be able to peek behind the curtain and see StarCraft's code in a similar fashion

[1] https://old.reddit.com/r/gamecollecting/comments/68xzxt/star...

[2] https://github.com/electronicarts

If you worked on Lost Vikings I'd like to thank you for the entertainment during my childhood. Given your background did you ever get involved in the demo scene?

Maximum overkill was an amazing game. I probably played hundreds and hundreds of hours.

Were you at blizzard when they lost their source code server and had no backups? I was there for a short time consulting around the time WC3 was released.

Fumito Ueda was notably quite concerned with the technical/production feasibility of his designs for Shadow of the Colossus. [1] Doom was an exercise in both creativity and expertise.

[1] https://www.designroom.site/shadow-of-the-colossus-oral-hist...

I learned these low-level bit tricks by reading TempleOS' HolyC source code. I remember feeling like a genius when I worked out what this line does:

dc->color=c++&15;

Hint: it's from this "Lines" demo program, whose source is here: https://web.archive.org/web/20180906060723/https://templeos....

And this is what it looks like when it runs (ignore the fact it's running in Minecraft): https://youtu.be/pAN_Fza6Vy8?t=38

I don't miss it. I also found Satisfactory's old fluid system (with concepts like sloshing) wildly unintuitive. I'll go so far as to say that accurate fluid dynamics is detrimental to any game that's not about beavers and water table management.

The old system was nonfunctional and any base that used lots of fluids (like modded ones, or new space age ones) were constantly running up against nonsensical mechanics.

Programming in assembly isn't really "hard" it mostly takes lots of discipline. Consistency and patterns are key. The language also provides very little implicit documentation, so always document which arguments are passed how and where, what registers are caller and callee saved. Of course it is also very tedious.

Now writing very optimized assembly is very hard. Because you need to break your consistency and conventions to squeeze out all the possible performance. The larger "kernel" you optimize the more pattern breaking code you need to keep in your head at a time.

Macros. Lots of macros.

Back then a lot of people started with assembly because that was the only way to make games quick enough. Throughout the years they accumulated tons of experience and routines and tools.

Not saying that it was not a huge feat, but it’s definitely a lot harder to start from scratch nowadays, even for the same platform.