Writing a C Compiler, in Zig (2025)

Looking at the repo, the author seemed a little fed up [1] with the nature of lower level language and quitted.

[1] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features. A compiler is just a text -> text translation tool if you can leverage other tools such as an assembler and never needs to access machine level instructions. E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image. Even when an assembler isn't available, all your implementation language needs to support, in terms of "low level" features, is writing of bytes to a file.

But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.

I thought Zig has a C compiler built in? Or is it just the Zig build system that's able to compile C, but uses an external compiler for that?

Still a proper programmer-flex to build another one.

Cool project. Feels like writing a C compiler in Zig aligns nicely with the old "maintain it in Zig" idea that was part of Zig's early value proposition. Is that still considered a relevant goal today?

Longer term it also makes me wonder whether something like this could eventually reduce reliance on Clang/LLVM for the C frontend in zig's toolchain.

Looking at the repo, the author seemed a little fed up [1] with the nature of lower level language and quitted.

[1] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

I’ve just read the two functions there by that footnote, `reaching_copies_meet`. I have so much code review feedback just on code style, before we even get into functionality. And it’s like 20 lines. (The function shouldn’t return an error set, it should take an allocator, the input parameter slices should be const, the function shouldn’t return either the input slice or a newly allocated slice.)

It’s interesting how Zig clicked for me pretty quickly (although I have been writing it for a couple of years now). But some of the strategies of ownership and data oriented design I picked up writing JavaScript. Sometimes returning a new slice and sometimes returning the same slice is a problem for memory cleanup, but I wouldn’t do it even in JavaScript because it makes it difficult for the caller to know whether they can mutate the slice safely.

I suspect that there’s a way to write this algorithm without allocating a temporary buffer for each iteration. If I’m right that it’s just intersecting N sets, then I would start by making a copy of the first set, and on each iteration, removing items that don’t appear in the new set. I suspect the author is frustrated that Zig doesn’t have an intersect primitive for arrays, but usually when the Zig standard library doesn’t have something, it’s intentionally pushing you to a different algorithm.

Feels like maybe something lost in translation with their explanation - they say they were fed up of data structures etc. but they returned to Rust? I’m assuming there’s something a bit more nuanced about what they got tired of with Zig

Quite a footnote [0]:

> I do not know if it is me being bored with the project, or annoyed with having to build and design a data structure, that has soured me on this project. But I have really at this point lost most motivation to continue this chapter. The way Zig is designed, it makes me deal with the data structure and memory management complexity head on, and it is tiresome. It is not "simpler" than, say, Rust: it just leaves the programmer to deal with the complexity, <strike-through>gaslighting the user</strike-through> claiming it is absolutely necessary.

[0] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.

> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.

Performance.

You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.

But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.

When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.

This comment started out strong, but then:

> Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

It may be the case that it doesn't conjure up such an image, but Pascal is approximately on the same rung as Zig or D—lower level than Go, higher level than assembly. If folks have a different impression, the problem is just that: their impression.

It's because a compiler is supposed to be high-level to low-level; you already have a lower-level language to write in it, and not a higher-level one. Writing a C compiler in a higher-level language than C is going backwards.

E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?

I thought Zig has a C compiler built in? Or is it just the Zig build system that's able to compile C, but uses an external compiler for that?

Still a proper programmer-flex to build another one.

Zig actually bundles LLVM's Clang, which it uses to compile C with the `zig cc` command. But the long term goal seems to not be so tightly coupled to LLVM, so I'm expecting that to move elsewhere. They still do some clever stuff around compiler-rt, allowing it to be better at cross-compilation than raw Clang, but the bulk of it is mostly just Clang.

There is also another C compiler written in Zig, Aro[1], which seems to be much more complete than TFA. Zig started using that as a library for its TranslateC functionality (for translating C headers into Zig, not whole programs) in 0.16.

[1]: https://github.com/Vexu/arocc

Longer term it also makes me wonder whether something like this could eventually reduce reliance on Clang/LLVM for the C frontend in zig's toolchain.

There is actually another C compiler written in Zig, Aro[1], which Zig started using since 0.16 for its TranslateC module.

[1]: https://github.com/Vexu/arocc

Quite a footnote [0]:

[0] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

There is actually another C compiler written in Zig, Aro[1], which Zig started using since 0.16 for its TranslateC module.

[1]: https://github.com/Vexu/arocc

Rust is a world away from Zig as far as being low-level. Rust does not have manual memory management and revolves around RAII which hides a great deal of complexity from you. Moreover it is not unusual for a Rust project to have 300+ dependencies that deal with data structures, synchronization, threading etc. Zig has a rich std lib, but is otherwise very bare and expects you to implement the things you actually want.

I think Rust is "higher level" than C or Zig in the sense that there are most abstractions than C or Zig. Its not Javascript, but it is possible to program Rust without worrying too much about low level concerns.

The author was fed up with not having data structures already provided, and needing to roll his own

While you can obviously write low level code in Rust and manage allocations, memory, use pointers etc, you can also write much higher level code leveraging abstractions both in Rust itself and its' rich ecosystem. If you're coming from higher level languages it's much friendlier than C/C++ or Zig. I think I would struggle to write C or Zig effectively but I have no issues with Rust and I really enjoy the language.

> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.

Performance.

When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.

>Having a garbage collector makes so many compiler algorithms and data structures easier.

Does it really? Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down. So you can just make append-only data structures and not care too much about freeing stuff.

I never worry about memory management when I write compilers in C.

I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).

E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.

The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.

[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!

Does “low level” translate to performance? Is Rust a “low level” language?

Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.

> But I also accept that that choice means there's an upper limit to how fast my compiler will.

Don't buy it.

A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.

Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.

And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.

I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.

This depends on what you mean by low level. Commonly it means, how much you need to take care about minute, low-level issues. In that way C, Rust, and Zig are about the same.

Dependencies have nothing to do with low-level vs. high-level but just package management, how well the language composes, and how rich the standard library is. Are assumptions in package A able to affect package B. In C that's almost impossible to avoid, because different people have different ideas about how long their objects live.

Having a rich standard library isn't just a pure positive. More code means more maintenance.

The author was fed up with not having data structures already provided, and needing to roll his own

Then it's actually the immature zig ecosystem that rubbed the author the wrong way, not zig the language itself. Not that the ecosystem isn't important, but IMO a language only truly fails you when it doesn't offer the composability and performance characteristics necessary for your solution.

Not really understanding what this would be though, zig has all the basic stuff you would expect in its stdlib (hashmap, queues, lists etc) just like Rust

> in the sense that there are most abstractions

is it a typo for more abstractions? or is there some different meaning?

Except if you need to expose or consume a C API, or you need to use some obscure performance improvement.

Which is still a crazy claim considering Rust is often told about having strong bureaucracy around even sharing variables (borrow checker).

[1]: https://github.com/Vexu/arocc

They're not planning on dropping Clang.

Not really understanding what this would be though, zig has all the basic stuff you would expect in its stdlib (hashmap, queues, lists etc) just like Rust

Except if you need to expose or consume a C API, or you need to use some obscure performance improvement.

Does “low level” translate to performance? Is Rust a “low level” language?

They're not planning on dropping Clang.

This depends on what you mean by low level. Commonly it means, how much you need to take care about minute, low-level issues. In that way C, Rust, and Zig are about the same.

Having a rich standard library isn't just a pure positive. More code means more maintenance.

I agree with you that package management has nothing to do with how low-level a language is.

That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`, and all of `async` are things that just wouldn't exist in Zig and allow Rust programmers to think at higher levels of abstraction when it comes to memory management.

> Having a rich standard library isn't just a pure positive. More code means more maintenance.

I would argue it's much worse to rely on packages that are not in the standard library since its harder to gain trust on maintenance and quality of the code you rely on. I do agree that more code is almost always just more of a burden though.

They kinda are: "This issue is to fully eliminate LLVM, Clang, and LLD libraries from the Zig project." https://github.com/ziglang/zig/issues/16270

Which is still a crazy claim considering Rust is often told about having strong bureaucracy around even sharing variables (borrow checker).

The languages trade complexity in different areas. Rust tries to prevent a class of problems that appear in almost all languages (i.e two threads mutating the same piece of data at the same time) via a strict type system and borrow checker. Zig won't do any of that but will force you to think about the allocator that you're using, when you need to free memory, the exact composition of your data structures, etc. Depending on the kind of programmer you are you may find one of these more difficult to work with than the other.

There are some cases in Rust where the borrow checker rejects valid programs, in those cases it may be because of a certain data structure in which case you probably have many crates available to solve the issue, or you can solve it yourself with boxing, cloning, or whatever. The vast majority of the time (imo) the borrow checker is just checking invariants you have to otherwise hold and maintain in your head, which is harder and more error prone.

The actual hard part of Rust is dealing with async, especially when building libraries. But thats the cost of a zero-cost async abstraction I suppose.

I agree with you that package management has nothing to do with how low-level a language is.

> Having a rich standard library isn't just a pure positive. More code means more maintenance.

> That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`

I mean, C++ have RAII and stuff like unique pointer, does that make it higher level than Zig?

And what if you don't use Arc or Box? Is your program now lower level than baseline Rust?

As I said, depends a lot about what you mean by low level.

They kinda are: "This issue is to fully eliminate LLVM, Clang, and LLD libraries from the Zig project." https://github.com/ziglang/zig/issues/16270

Yes, as a backend. Clang as the `zig cc` frontend will stay (and become optional) to my knowledge.

libraries, not processes.

I find that a very bold move, how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?

The actual hard part of Rust is dealing with async, especially when building libraries. But thats the cost of a zero-cost async abstraction I suppose.

Yes, as a backend. Clang as the `zig cc` frontend will stay (and become optional) to my knowledge.

libraries, not processes.

> That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`

I mean, C++ have RAII and stuff like unique pointer, does that make it higher level than Zig?

And what if you don't use Arc or Box? Is your program now lower level than baseline Rust?

As I said, depends a lot about what you mean by low level.

IMO "level" roughly corresponds to the amount of runtime control flow hidden by abstractions. Zig is famous for having almost no hidden runtime control flow, this appears pretty "low level" to many. OTOH, Zig can have highly non-trivial hidden compile time control flow thanks to comptime reflection, but hardly anyone identifies Zig as a "high level" metaprogramming language.

It depends on the facilities the language offers to you by default right?

C++ offers much higher level primitives out of the box compared to Zig, so I'd say its a higher level language. Of course you can ignore all the features of C++ and just write C, but that's not why people are picking the language.

I'd say so. Zig is aiming to be a bit smarter than C while staying at roughly the same level. C++ more sought/seeks to support C but offer higher level things with it.

I find that a very bold move, how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?

It depends on the facilities the language offers to you by default right?

They're just removing the obligate dependency. I'm pretty sure they will keep it around as a first-class supported backend target for compilation.

Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years

You need to implement very few optimizations to get the vast majority of compiler improvements.

Many of the papers about this suggest that we would be better off focusing on making quality of life improvements for the programmer (like better debugger integration) rather than abstruse and esoteric compiler optimizations that make understanding the generated code increasingly difficult.

as a comment about a particular project and its goals and timelines, this is fine. as a general statement that we should never revisit things its pretty offensive. llvm makes a lot of assumptions about the structure of your code and the way its manipulated. if I were working on a language today I would try my best to avoid it. the back ends are where most of the value is and why I might be tempted to use it.

we should really happy that language evolution has started again. language monoculture was really dreary and unproductive.

20 years ago you would be called insane for throwing away all the man-years of optimization baked into oracle, and I guess postgres or mysql if you were being low rent. and look where we are today, thousands of people can build databases.

All that will still be available just not in main zig repo. Someone may have asked same question about LLVM when GNU compiler exist.

I'd say so. Zig is aiming to be a bit smarter than C while staying at roughly the same level. C++ more sought/seeks to support C but offer higher level things with it.

we should really happy that language evolution has started again. language monoculture was really dreary and unproductive.

All that will still be available just not in main zig repo. Someone may have asked same question about LLVM when GNU compiler exist.

E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?

This comment started out strong, but then:

> Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

>Having a garbage collector makes so many compiler algorithms and data structures easier.

I never worry about memory management when I write compilers in C.

[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!

> But I also accept that that choice means there's an upper limit to how fast my compiler will.

Don't buy it.

And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.

> in the sense that there are most abstractions

is it a typo for more abstractions? or is there some different meaning?

Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years

You need to implement very few optimizations to get the vast majority of compiler improvements.

They're just removing the obligate dependency. I'm pretty sure they will keep it around as a first-class supported backend target for compilation.

No, the whole point is to eliminate dependencies that they have to maintain. "not obligate" really doesn't mean anything if it's available as a backend--the obligation is on the Zig developers to keep it working, and they want to eliminate that obligation.

And the original question was "how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?" -- the answer is that Andrew naively believes that they can recreate comparable optimization.

There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about--much of the discussion of using low-level vs high-level languages for writing compilers is nonsense. And one person wrote of "Zig and D" as if those languages are comparable, when D is at least as high level as C++, which it was intended to replace.

> the answer is that Andrew naively believes that they can recreate comparable optimization.

That's exactly wrong.

> There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about.

Well spoken. You should look in the mirror.

> the answer is that Andrew naively believes that they can recreate comparable optimization.

That's exactly wrong.

> There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about.

Well spoken. You should look in the mirror.

Hacker Times

Hacker Times

Writing a C Compiler, in Zig (2025)

Discussion

Discussion