Thoughts on Generating C

Static inline functions can sometimes serve as an optimisation barrier to compilers. Its very annoying. I've run into a lot of cases when targeting C as a compilation target where swapping something out into an always-inline function results in worse code generation, because compilers have bugs sadly

There's also the issue in that the following two things don't have the same semantics in C:

    float v = a * b + c;

    static_inline float get_thing(float a, float b) {
        return a*b;
    }

    float v = get_thing(a, b) + c;

This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!

uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:

    void my_func(strong_type1* a, strong_type2* b);

a =/= b, and we can pull the underlying type out. However, if you write:

    void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
        float* a = get(ap);
        float* b = get(bp);
    }

a could equal b. Semantically the uintptr_t version doesn't provide any aliasing semantics. Which may or may not be what you want depending on your higher level language semantics, but its worth keeping the distinction in mind because the compiler won't be able to optimise as well

I think I may end up coming full circle on Virgil. Circa 2005 Virgil I compiled to C and then with avr-gcc to AVR. I did that because who the heck wants to write an AVR backend? Circa 2009 I wrote a whole new compiler for Virgil III and since then it has JVM, x86, x86-64, wasm, wasm-gc and (incomplete) arm64.

I like compiler backends, but truth be told, I grow weary of compiler backends.

I have considered generating LLVM IR but it's too quirky and unstable. Given the Virgil wasm backend already has a shadow stack, it should now be possible for me to go back to square one and generate C code, but manage roots on the stack for a precise GC.

Hmm....

Having done this for a dozen of experiments/toys I fully agree with most of the post, would be nice if the the addition of must_tail attribute could be reliable across the big 3 compilers, but it's not something that can be relied on (luckily Clang seems to be fairly reliable on Windows these days).

2 additional points,

1: The article mentions DWARF, even without it you can use #line directives to give line-numbers in your generated code (and this goes a very long way when debugging), the other part is local variables and their contents.

For variables one can get a good distance by using a C++ subset(a subset that doesn't affect compile time, so avoid any std:: namespaced includes) instead and f.ex. "root/gc/smart" ptr's,etc (depending on language semantics), since the variables will show up in a debugger when you have your #line directives (so "sane" name mangling of output variables is needed).

2: The real sore point of C as a backend is GC, the best GC's are intertwined with the regular stack-frame so normal stack-walking routines also gives everything needed for accuracte GC (required for any moving GC designs, even if more naive generation collectors are possible without it).

Now if you want accurate somewhat fast portable stack-scanning the most sane way currently is to maintain a shadow-stack, where you pass prev-frame ptrs in calls and the prev-frame ptr is a ptr to the end of a flat array that is pre-pended by a magic ptr and the previous prev-frame ptr (forming a linked list with the cost of a few writes, one extra argument with no cleanup cost).

Sadly, the performant linked shadow-stack will obfuscate all your pointers for debugging since they need to be clumped into one array instead of multiple named variables (and restricts you from on-stack complex objects).

Hopefully, one can use the new C++ reflection support for shadow-stacks without breaking compile times, but that's another story.

I'm not a C programmer - having coded in high-level languages only for the past 20 years - but I've been doing a lot of WASM recently, and eagerly looking forward to the stack switching proposal so I don't have to implement an asincify-type transform for an async/await feature.

If it's true that a C program doesn't have control of the stack, what does that mean for supporting the stack switching in Wastrel? Can you not reify the stack and replace it with another from a suspended async function? Do you need some kind of userland stack for all stacks once you support WASM stack switching?

Generators don't have to put out portable code. You document what compilers are required for the output and that's something you can change with any given release of your generator. Then the generated code uses whatever works with those compilers. If you use the output with some other compiler, then that's undefined behavior w.r.t. the documentation of the generator; you are on your own. "Whatever works" could be something undocumented that works de facto.

> And finally, source-level debugging is gnarly. You would like to be able to embed DWARF information corresponding to the code you residualize; I don’t know how to do that when generating C.

I think emitting something like

    #line 12 "source.wasm"

for each line of your source before the generated code for that line does something that GDB recognizes well enough.

I’ve done something similar during my intern days as well. We had a Haskell-based C AST library that supports the subset of C we generate, and an accompanying pretty printing library for generating C code that has good formatting by default. It really was a reasonable approach for good high-level abstraction power and good optimizations.

I wonder what Zig would be like as an ILR. Easy cross compilation, plus, you can compile with runtime checks to help debug your compiler output. Might be fun for a sideproject

Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

I've thought of doing that, but it's too much fun writing an optimizer and code generator!

(My experience with "compile to C" is with cfront, the original C++ implementation that compiled to C. The generated code was just terrible to read.)

What language features would make C better as a target language for compilers?

Love how he put a paragraph for someone asking, "why not generate Rust?". Beautiful.

> it could be that you end up compiling a function with, like 30 arguments, or 30 return values; I don’t trust a C compiler to reliably shuffle between different stack argument needs at tail calls to or from such a function.

Yet you trust it to generate the frame for this leviathan in the first place. Sometimes C is about writing quality code, apparently, sometimes it's about spending all day trying to outsmart the compiler rather than take advantage of it.

This is weird. As soon as I thought about the subject the relevant article showed up on HN.

I was thinking about how to embed custom high level language into my backend application written in C++. Each individual script would compile to native shared lib loadable on demand so that the performance stays high. For this I was contemplating exactly this approach. Compile this high level custom language with very limited feature set to plain C and then have compiler that comes with Linux finish the job.

Last I checked static inline was merely a hint that compilers need not take. They all do, but by definition it's not a zero cost abstraction.

Java JIT compilers perform function inlining across virtual function boundaries… this is why JIT’d Java can outperform the same C or C++ code. Couple it with escape analysis to transfer short-lived allocations to be stack-allocated (avoiding GC).

Often times virtual functions are implemented in C to provide an interface (such as filesystem code in the Linux kernel) via function pointers—-just like C++ vtable lookups, these cannot be inlined at compile time.

What I wonder is whether code generated in C can be JIT-optimized by WASM runtimes with similar automatic inlining.

"static inline", the best way of getting people doing bindings in other languages to dislike your library (macros are just as bad, FWIW).

I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.

2 additional points,

Hopefully, one can use the new C++ reflection support for shadow-stacks without breaking compile times, but that's another story.

> ... [pointers] need to be clumped into one array ...

You could put each stack frame into a struct, and have the first field be a pointer to a const static stack-map data structure or function that enumerates the pointers within the frame.

BTW, the passed pointer to this struct could also be used to implement access to the calling function's variables, for when you have nested functions and closures.

Related to shadow stacks, I've had trouble convincing the C optimizer that no one else is aliasing my heap-allocated helper stacks. Supposedly there ought to be a way to tell it using restrict annotations, but those are quite fiddly: only work for function parameters, and can be dusmissed for many reasons. Does anyone know of a compiler that successfully used restrict pointers in their generated code? I'd love to be pointed towards something that works.

There's also the issue in that the following two things don't have the same semantics in C:

    float v = a * b + c;

    static_inline float get_thing(float a, float b) {
        return a*b;
    }

    float v = get_thing(a, b) + c;

This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!

uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:

    void my_func(strong_type1* a, strong_type2* b);

a =/= b, and we can pull the underlying type out. However, if you write:

    void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
        float* a = get(ap);
        float* b = get(bp);
    }

I wonder what Zig would be like as an ILR. Easy cross compilation, plus, you can compile with runtime checks to help debug your compiler output. Might be fun for a sideproject

The inline function receives the operands as arguments, and so whatever they are, they get converted to float. Thus the inline code is effectively like this:

  float v = (float) ((float) a) * ((float) b) + c;

Since v is float, the cast representing the return conversion can be omitted:

  float v = ((float) a) * ((float) b) + c;

Now, if a and b are already float, then it's equivalent. Otherwise not; if they are double or int, we get double or int multiplication in the original open code.

Compiler bugs and standards warts suck, but you know what sucks more? Workarounds for compiler bugs and edge cases that become pessimizing folk wisdom that we can dispell only after decades, if ever. It took about that long to convince the old guards of various projects that we could have inline functions instead of macros. I don't want to spook them into renewed skepticism.

I like compiler backends, but truth be told, I grow weary of compiler backends.

Hmm....

FWIW I think the LLVM bitcode format has stronger compatibility guarantees than the text IR. But I agree it's a bit of a pain either way; plus, if you forgo linking to the library and just rely on whatever 'llc' the user has installed, figuring out bugs is not a fun time...

> And finally, source-level debugging is gnarly. You would like to be able to embed DWARF information corresponding to the code you residualize; I don’t know how to do that when generating C.

I think emitting something like

    #line 12 "source.wasm"

for each line of your source before the generated code for that line does something that GDB recognizes well enough.

This is weird. As soon as I thought about the subject the relevant article showed up on HN.

Love how he put a paragraph for someone asking, "why not generate Rust?". Beautiful.

Last I checked static inline was merely a hint that compilers need not take. They all do, but by definition it's not a zero cost abstraction.

I've thought of doing that, but it's too much fun writing an optimizer and code generator!

(My experience with "compile to C" is with cfront, the original C++ implementation that compiled to C. The generated code was just terrible to read.)

What I wonder is whether code generated in C can be JIT-optimized by WASM runtimes with similar automatic inlining.

> ... [pointers] need to be clumped into one array ...

You could put each stack frame into a struct, and have the first field be a pointer to a const static stack-map data structure or function that enumerates the pointers within the frame.

BTW, the passed pointer to this struct could also be used to implement access to the calling function's variables, for when you have nested functions and closures.

If you have ever used something like yacc/bison, debugging it is relatively sane with gdb.

You can find all the possible tricks in making it debuggable by reading the y.tab.c

Including all the corner cases for odd compilers.

Re2c is a bit more modern if you don't need all the history of yacc.

Nim compiles to C, and it has a compiler iotion that does this.

What’s your use case for this?

I’ve done a few such things for compiling ML models, but in the end I always regretted it.

The lifetimes argument is extremely sound: this is information which you need from the developer, and not something that is easy to get when generating from a language which does not itself have lifetimes. It's an especially bad fit for the GC case he describes.

<something something about having a hammer and seeing nails everywhere> :)

Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

Not precisely, but C-- (hard to search for!) was a C-like (or C subset?) intermediate language for compilers to generate.

I found this Reddit thread that gives a bit more detail:

https://www.reddit.com/r/haskell/comments/1pbbon/c_as_a_proj...

and the project link:

https://www.cs.tufts.edu/~nr/c--/

Sounds like why LLVM was created? (and derivatives like MLIR and NaCL) Its IR is intended be be C-like, except that everything is well-defined and substantially more expressive than C.

For portability, hopefully C89 as well?

I think one could also use a subset compatible with a formal semantics of C. Maybe the C semantics in K Framework, CompCert C, or C0 from Verisoft. Alternatively, whatever is supported in open-source, verification tooling.

Then, we have both a precise semantics and tools to help produce robust output.

"static inline", the best way of getting people doing bindings in other languages to dislike your library (macros are just as bad, FWIW).

If it's not in the .h file it's supposed to be a private function.

What language features would make C better as a target language for compilers?

By piggybacking off GCC et al you gain very easy portability/access to a bunch of platforms that most languages would never attempt to support.

We used this in production with Nim for embedded firmware engineering at my previous job doing industrial IoT systems, which let us write a much nicer language than C (and much faster, much safer), with code-sharing between the network server (and its comms protocol) and the firmware code itself.

All can be done with C itself of course, but this let us achieve it faster and in a much nicer fashion

Not strictly for compilers (which probably don't need to use macros much), but for normal macro-codegen it would be very useful to have some some way to add line returns in macro-generated code so that it's easier to inspect with gcc -E

Nim compiles to C, and it has a compiler iotion that does this.

What’s your use case for this?

I’ve done a few such things for compiling ML models, but in the end I always regretted it.

Note that declaring no aliasing is probably unsafe for concurrent or moving garbage collectors, as then the C compiler can conveniently "forget" to either store or load values to the shadow stack at some points...

(though it is fine if GC can only happen inside a function call and the call takes the shadow stack as an argument)

The inline function receives the operands as arguments, and so whatever they are, they get converted to float. Thus the inline code is effectively like this:

  float v = (float) ((float) a) * ((float) b) + c;

Since v is float, the cast representing the return conversion can be omitted:

  float v = ((float) a) * ((float) b) + c;

Now, if a and b are already float, then it's equivalent. Otherwise not; if they are double or int, we get double or int multiplication in the original open code.

> Now, if a and b are already float, then it's equivalent.

Not necessarily! Floating-point contraction is allowable essentially within statements but not across them. By assigning the result of a * b into a value, you prohibit contraction from being able to contract with the addition into an FMA.

In practice, every compiler has fast-math flags which says stuff it and allows all of these optimizations to occur across statements and even across inline boundaries.

(Then there's also the issue of FLT_EVAL_METHOD, another area where what the standard says and what compilers actually do are fairly diametrically opposed.)

Maybe they just checked with a compiler and got the same code?

If you have ever used something like yacc/bison, debugging it is relatively sane with gdb.

You can find all the possible tricks in making it debuggable by reading the y.tab.c

Including all the corner cases for odd compilers.

Re2c is a bit more modern if you don't need all the history of yacc.

Sounds like why LLVM was created? (and derivatives like MLIR and NaCL) Its IR is intended be be C-like, except that everything is well-defined and substantially more expressive than C.

Debugging Yacc is completely insane with gdb, for other reasons, like that grammar rules aren't functions you can just put a breakpoint on, and see their backtrace, etc, as you can with a recursive descent parser.

But yes, you can put a line-oriented breakpoint on your action code and step through it.

For portability, hopefully C89 as well?

Then, we have both a precise semantics and tools to help produce robust output.

Not precisely, but C-- (hard to search for!) was a C-like (or C subset?) intermediate language for compilers to generate.

I found this Reddit thread that gives a bit more detail:

https://www.reddit.com/r/haskell/comments/1pbbon/c_as_a_proj...

and the project link:

https://www.cs.tufts.edu/~nr/c--/

By piggybacking off GCC et al you gain very easy portability/access to a bunch of platforms that most languages would never attempt to support.

All can be done with C itself of course, but this let us achieve it faster and in a much nicer fashion

If it's not in the .h file it's supposed to be a private function.

you can access it using extern from anywhere:

    // a.c
    int f( int x ) {
        return x + 1;
    }

    // b.c
    extern int f(int x );

    int main() {
        int y = f(41);
    }

but if f() had been defined as static, you couldn't do this.

<something something about having a hammer and seeing nails everywhere> :)

(though it is fine if GC can only happen inside a function call and the call takes the shadow stack as an argument)

Maybe they just checked with a compiler and got the same code?

> not something that is easy to get when generating from a language which does not itself have lifetimes

Not easy, but there are compilers that do it.

Lobster [0] started out with automatic reference counting. It has inferred static typing, specialising functions based on type, reminiscent of how Javascript JIT compilers do it. Then the type inference engine was expanded to also specialise functions based on ownership/borrowing type of its arguments. RC is still done for variables that don't fit into the ownership system but the executed ops overall got greatly reduced. The trade-off is increased code size.

I have read a few older papers about eliding reference counting ops which seem to be resulting in similar elisions, except that those had not been expressed in terms of ownership/borrowing.

I think newer versions of the Swift compiler too infer lifetimes to some extent.

When emitting Rust you could now also use reference counting smart pointers, even with cycle detection [1]. Personally I'm interested in how ownership information could be used to optimise tracing GC.

[0]:https://aardappel.github.io/lobster/memory_management.html

[1]:https://www.semanticscholar.org/paper/Breadth-first-Cycle-Co...

If I were targeting Rust for compilation, I wouldn't do lifetimes, instead everything would be unsafe Rust using raw pointers.

I'd have to do an actual project to see how annoying it is to lower semantics to unsafe Rust to know for sure, but my guess is you'd be slightly better off because you don't have to work around implicit conversions in C, the more gratuitous UBs in C, and I think I'd prefer the slightly more complete intrinsic support in Rust over C.

I mean, the argument boils down to "the language I'm compiling FROM doesn't have the same safeguards as rust". So obviously, the fault lies there. If he'd just compile FROM rust, he could then compile TO rust without running into those limitations. A rust-to-rust compiler (written in rust) would surely be ideal.

> Now, if a and b are already float, then it's equivalent.

In practice, every compiler has fast-math flags which says stuff it and allows all of these optimizations to occur across statements and even across inline boundaries.

(Then there's also the issue of FLT_EVAL_METHOD, another area where what the standard says and what compilers actually do are fairly diametrically opposed.)

The first mention of contraction in the standard (I'm looking at N3220 draft that I have handy) is:

A floating expression may be contracted, that is, evaluated as though it were a single opera- tion, thereby omitting rounding errors implied by the source code and the expression evalua- tion method.86) The FP_CONTRACT pragma in <math.h> provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.

If you're making a language that generates C, it's probably a good idea to pin down which C compilers are supported, and control the options passed to them. Then you can more or less maintain the upper hand on issues like this.

It seems to me that either you want to allow for contraction everywhere, or not all. Allowing it only sometimes is worst of both worlds.

But yes, you can put a line-oriented breakpoint on your action code and step through it.

If I were targeting Rust for compilation, I wouldn't do lifetimes, instead everything would be unsafe Rust using raw pointers.

The first mention of contraction in the standard (I'm looking at N3220 draft that I have handy) is:

you can access it using extern from anywhere:

    // a.c
    int f( int x ) {
        return x + 1;
    }

    // b.c
    extern int f(int x );

    int main() {
        int y = f(41);
    }

but if f() had been defined as static, you couldn't do this.

"private function" doesn't mean "you can't know about this", it means "you shouldn't rely on this as a stable interface to my code".

Just because you can use the information you have to call a given function, doesn't mean you aren't violating an interface.

I don't see what you're getting at with respect to writing bindings.

The whole point of using "static" in that way is to prevent people from using it outside of the file.

If you need to call a static function (inline or otherwise) from outside of the compilation unit to use the API, then it's a bug in the API, not a problem with static.

I agree with you about pre-processor macros, though.

> not something that is easy to get when generating from a language which does not itself have lifetimes

Not easy, but there are compilers that do it.

I have read a few older papers about eliding reference counting ops which seem to be resulting in similar elisions, except that those had not been expressed in terms of ownership/borrowing.

I think newer versions of the Swift compiler too infer lifetimes to some extent.

[0]:https://aardappel.github.io/lobster/memory_management.html

[1]:https://www.semanticscholar.org/paper/Breadth-first-Cycle-Co...

It seems to me that either you want to allow for contraction everywhere, or not all. Allowing it only sometimes is worst of both worlds.

I was also reading through lobsters Memory management, which (i think) currently implements "borrow first" semantics, to do away with a lot of run-time reference counting logic, which i think is a very practical approach. Also i have doubts if reference counting overhead ever becomes too much for some languages to never consider RC ?

Tangentially, i was experimenting with a runtime library to expose such "borrow-first" semantics, such "lents" can be easily copied on a new thread stack to access shared memory, and are not involved in RC . Race-conditions detection helps to share memory without any explicit move to a new thread. It seems to work well for simpler data-structures like sequence/vectors/strings/dictionary, but have not figured a proper way to handle recursive/dynamic data-structures!

I'd be willing to sell you a rust to rust compiler. In fact, I'll even generalize it to do all sorts of other languages too at no extra charge. I just need a good name...maybe rsync?

Snark aside, the output targets of compilers need to be unsafe languages typically, since the point of a high level compiler in general is to verify difficult proofs, then emit constructs consistent with those proof results, but simplified so that they cannot be verified anymore, but can run fast since those proofs aren't needed at runtime anymore. (Incidentally this is both a strength and weakness of C, since it provides very little ability for the compiler to do proofs, the output is generally close to the input, while other languages typically have much more useful compilers since they do much more proof work at compile time to make runtime faster, while C just makes the programmer specify exactly what must be done, and leaves the proof of correctness up to the programmer)

If you allow contraction after inlining, whether or not an FMA will get contracted becomes subject to the vicissitudes of inlining and other compiler decisions that can be hard-to-predict. It turns out to be a lot harder of a problem to solve than it appears at first glance.

I don't see what you're getting at with respect to writing bindings.

The whole point of using "static" in that way is to prevent people from using it outside of the file.

If you need to call a static function (inline or otherwise) from outside of the compilation unit to use the API, then it's a bug in the API, not a problem with static.

I agree with you about pre-processor macros, though.

"private function" doesn't mean "you can't know about this", it means "you shouldn't rely on this as a stable interface to my code".

Just because you can use the information you have to call a given function, doesn't mean you aren't violating an interface.

my point was that f() had been defined static then you can't access it from outside the translation unit it is defined in - in other words, it is "private". i'm afraid i'm unclear what your point is.

I'd be willing to sell you a rust to rust compiler. In fact, I'll even generalize it to do all sorts of other languages too at no extra charge. I just need a good name...maybe rsync?

Compilers named after animals are the the most popular, so I might suggest cat?

my point was that f() had been defined static then you can't access it from outside the translation unit it is defined in - in other words, it is "private". i'm afraid i'm unclear what your point is.

Compilers named after animals are the the most popular, so I might suggest cat?

So I work in compilers, which means that I write programs that translate programs to programs. Sometimes you will want to target a language at a higher level than just, like, assembler, and oftentimes C is that language. Generating C is less fraught than writing C by hand, as the generator can often avoid the undefined-behavior pitfalls that one has to be so careful about when writing C by hand. Still, I have found some patterns that help me get good results.

Today’s note is a quick summary of things that work for me. I won’t be so vain as to call them “best practices”, but they are my practices, and you can have them too if you like.

static inline functions enable data abstraction

When I learned C, in the early days of GStreamer (oh bless its heart it still has the same web page!), we used lots of preprocessor macros. Mostly we got the message over time that many macro uses should have been inline functions; macros are for token-pasting and generating names, not for data access or other implementation.

But what I did not appreciate until much later was that always-inline functions remove any possible performance penalty for data abstractions. For example, in Wastrel, I can describe a bounded range of WebAssembly memory via a memory struct, and an access to that memory in another struct:

struct memory { uintptr_t base; uint64_t size; }; struct access { uint32_t addr; uint32_t len; };

And then if I want a writable pointer to that memory, I can do so:

#define static_inline \ static inline __attribute__((always_inline))

static_inline void* write_ptr(struct memory m, struct access a) { BOUNDS_CHECK(m, a); char *base = __builtin_assume_aligned((char *) m.base_addr, 4096); return (void *) (base + a.addr); }

(Wastrel usually omits any code for BOUNDS_CHECK, and just relies on memory being mapped into a PROT_NONE region of an appropriate size. We use a macro there because if the bounds check fails and kills the process, it’s nice to be able to use __FILE__ and __LINE__.)

Regardless of whether explicit bounds checks are enabled, the static_inline attribute ensures that the abstraction cost is entirely burned away; and in the case where bounds checks are elided, we don’t need the size of the memory or the len of the access, so they won’t be allocated at all.

If write_ptr wasn’t static_inline, I would be a little worried that somewhere one of these struct values would get passed through memory. This is mostly a concern with functions that return structs by value; whereas in e.g. AArch64, returning a struct memory would use the same registers that a call to void (*)(struct memory) would use for the argument, the SYS-V x64 ABI only allocates two general-purpose registers to be used for return values. I would mostly prefer to not think about this flavor of bottleneck, and that is what static inline functions do for me.

avoid implicit integer conversions

C has an odd set of default integer conversions, for example promoting uint8_t to signed int, and also has weird boundary conditions for signed integers. When generating C, we should probably sidestep these rules and instead be explicit: define static inline u8_to_u32, s16_to_s32, etc conversion functions, and turn on -Wconversion.

Using static inline cast functions also allows the generated code to assert that operands are of a particular type. Ideally, you end up in a situation where all casts are in your helper functions, and no cast is in generated code.

wrap raw pointers and integers with intent

Whippet is a garbage collector written in C. A garbage collector cuts across all data abstractions: objects are sometimes viewed as absolute addresses, or ranges in a paged space, or offsets from the beginning of an aligned region, and so on. If you represent all of these concepts with size_t or uintptr_t or whatever, you’re going to have a bad time. So Whippet has struct gc_ref, struct gc_edge, and the like: single-member structs whose purpose it is to avoid confusion by partitioning sets of applicable operations. A gc_edge_address call will never apply to a struct gc_ref, and so on for other types and operations.

This is a great pattern for hand-written code, but it’s particularly powerful for compilers: you will often end up compiling a term of a known type or kind and you would like to avoid mistakes in the residualized C.

For example, when compiling WebAssembly, consider struct.set‘s operational semantics: the textual rendering states, “Assert: Due to validation, val is some ref.struct structaddr.” Wouldn’t it be nice if this assertion could translate to C? Well in this case it can: with single-inheritance subtyping (as WebAssembly has), you can make a forest of pointer subtypes:

typedef struct anyref { uintptr_t value; } anyref; typedef struct eqref { anyref p; } eqref; typedef struct i31ref { eqref p; } i31ref; typedef struct arrayref { eqref p; } arrayref; typedef struct structref { eqref p; } structref;

So for a (type $type_0 (struct (mut f64))), I might generate:

typedef struct type_0ref { structref p; } type_0ref;

Then if I generate a field setter for $type_0, I make it take a type_0ref:

static inline void type_0_set_field_0(type_0ref obj, double val) { ... }

In this way the types carry through from source to target language. There is a similar type forest for the actual object representations:

typedef struct wasm_any { uintptr_t type_tag; } wasm_any; typedef struct wasm_struct { wasm_any p; } wasm_struct; typedef struct type_0 { wasm_struct p; double field_0; } type_0; ...

And we generate little cast routines to go back and forth between type_0ref and type_0* as needed. There is no overhead because all routines are static inline, and we get pointer subtyping for free: if a struct.set $type_0 0 instruction is passed a subtype of $type_0, the compiler can generate an upcast that type-checks.

fear not memcpy

In WebAssembly, accesses to linear memory are not necessarily aligned, so we can’t just cast an address to (say) int32_t* and dereference. Instead we memcpy(&i32, addr, sizeof(int32_t)), and trust the compiler to just emit an unaligned load if it can (and it can). No need for more words here!

for ABI and tail calls, perform manual register allocation

So, GCC finally has __attribute__((musttail)): praise be. However, when compiling WebAssembly, it could be that you end up compiling a function with, like 30 arguments, or 30 return values; I don’t trust a C compiler to reliably shuffle between different stack argument needs at tail calls to or from such a function. It could even refuse to compile a file if it can’t meet its musttail obligations; not a good characteristic for a target language.

Really you would like it if all function parameters were allocated to registers. You can ensure this is the case if, say, you only pass the first n values in registers, and then pass the rest in global variables. You don’t need to pass them on a stack, because you can make the callee load them back to locals as part of the prologue.

What’s fun about this is that it also neatly enables multiple return values when compiling to C: simply go through the set of function types used in your program, allocate enough global variables of the right types to store all return values, and make a function epilogue store any “excess” return values—those beyond the first return value, if any—in global variables, and have callers reload those values right after calls.

what’s not to like

Generating C is a local optimum: you get the industrial-strength instruction selection and register allocation of GCC or Clang, you don’t have to implement many peephole-style optimizations, and you get to link to to possibly-inlinable C runtime routines. It’s hard to improve over this design point in a marginal way.

There are drawbacks, of course. As a Schemer, my largest source of annoyance is that I don’t have control of the stack: I don’t know how much stack a given function will need, nor can I extend the stack of my program in any reasonable way. I can’t iterate the stack to precisely enumerate embedded pointers (but perhaps that’s fine). I certainly can’t slice a stack to capture a delimited continuation.

The other major irritation is about side tables: one would like to be able to implement so-called zero-cost exceptions, but without support from the compiler and toolchain, it’s impossible.

And finally, source-level debugging is gnarly. You would like to be able to embed DWARF information corresponding to the code you residualize; I don’t know how to do that when generating C.

(Why not Rust, you ask? Of course you are asking that. For what it is worth, I have found that lifetimes are a frontend issue; if I had a source language with explicit lifetimes, I would consider producing Rust, as I could machine-check that the output has the same guarantees as the input. Likewise if I were using a Rust standard library. But if you are compiling from a language without fancy lifetimes, I don’t know what you would get from Rust: fewer implicit conversions, yes, but less mature tail call support, longer compile times... it’s a wash, I think.)

Oh well. Nothing is perfect, and it’s best to go into things with your eyes wide open. If you got down to here, I hope these notes help you in your generations. For me, once my generated C type-checked, it worked: very little debugging has been necessary. Hacking is not always like this, but I’ll take it when it comes. Until next time, happy hacking!

Hacker Times