[1] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...
But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.
Longer term it also makes me wonder whether something like this could eventually reduce reliance on Clang/LLVM for the C frontend in zig's toolchain.
> I do not know if it is me being bored with the project, or annoyed with having to build and design a data structure, that has soured me on this project. But I have really at this point lost most motivation to continue this chapter. The way Zig is designed, it makes me deal with the data structure and memory management complexity head on, and it is tiresome. It is not "simpler" than, say, Rust: it just leaves the programmer to deal with the complexity, <strike-through>gaslighting the user</strike-through> claiming it is absolutely necessary.
[0] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...
It’s interesting how Zig clicked for me pretty quickly (although I have been writing it for a couple of years now). But some of the strategies of ownership and data oriented design I picked up writing JavaScript. Sometimes returning a new slice and sometimes returning the same slice is a problem for memory cleanup, but I wouldn’t do it even in JavaScript because it makes it difficult for the caller to know whether they can mutate the slice safely.
I suspect that there’s a way to write this algorithm without allocating a temporary buffer for each iteration. If I’m right that it’s just intersecting N sets, then I would start by making a copy of the first set, and on each iteration, removing items that don’t appear in the new set. I suspect the author is frustrated that Zig doesn’t have an intersect primitive for arrays, but usually when the Zig standard library doesn’t have something, it’s intentionally pushing you to a different algorithm.
Performance.
You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.
But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.
When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.
There is also another C compiler written in Zig, Aro[1], which seems to be much more complete than TFA. Zig started using that as a library for its TranslateC functionality (for translating C headers into Zig, not whole programs) in 0.16.
Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.
Dependencies have nothing to do with low-level vs. high-level but just package management, how well the language composes, and how rich the standard library is. Are assumptions in package A able to affect package B. In C that's almost impossible to avoid, because different people have different ideas about how long their objects live.
Having a rich standard library isn't just a pure positive. More code means more maintenance.
That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`, and all of `async` are things that just wouldn't exist in Zig and allow Rust programmers to think at higher levels of abstraction when it comes to memory management.
> Having a rich standard library isn't just a pure positive. More code means more maintenance.
I would argue it's much worse to rely on packages that are not in the standard library since its harder to gain trust on maintenance and quality of the code you rely on. I do agree that more code is almost always just more of a burden though.
The actual hard part of Rust is dealing with async, especially when building libraries. But thats the cost of a zero-cost async abstraction I suppose.
I mean, C++ have RAII and stuff like unique pointer, does that make it higher level than Zig?
And what if you don't use Arc or Box? Is your program now lower level than baseline Rust?
As I said, depends a lot about what you mean by low level.
C++ offers much higher level primitives out of the box compared to Zig, so I'd say its a higher level language. Of course you can ignore all the features of C++ and just write C, but that's not why people are picking the language.
we should really happy that language evolution has started again. language monoculture was really dreary and unproductive.
20 years ago you would be called insane for throwing away all the man-years of optimization baked into oracle, and I guess postgres or mysql if you were being low rent. and look where we are today, thousands of people can build databases.
E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.
How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?
> Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.
It may be the case that it doesn't conjure up such an image, but Pascal is approximately on the same rung as Zig or D—lower level than Go, higher level than assembly. If folks have a different impression, the problem is just that: their impression.
Does it really? Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down. So you can just make append-only data structures and not care too much about freeing stuff.
I never worry about memory management when I write compilers in C.
E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.
The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.
[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!
Don't buy it.
A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.
Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.
And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.
I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.
is it a typo for more abstractions? or is there some different meaning?
You need to implement very few optimizations to get the vast majority of compiler improvements.
Many of the papers about this suggest that we would be better off focusing on making quality of life improvements for the programmer (like better debugger integration) rather than abstruse and esoteric compiler optimizations that make understanding the generated code increasingly difficult.
And the original question was "how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?" -- the answer is that Andrew naively believes that they can recreate comparable optimization.
There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about--much of the discussion of using low-level vs high-level languages for writing compilers is nonsense. And one person wrote of "Zig and D" as if those languages are comparable, when D is at least as high level as C++, which it was intended to replace.
That's exactly wrong.
> There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about.
Well spoken. You should look in the mirror.