Most stupid thing I ever heard. If a safety violation is known at compile-time, you error at compile-time. You might never catch it in a test, and there you have the panic at the customer. He will be pleased.
1.A developer identified the necessity of inline assembly.
2.Defined the safety boundaries for 'memory-safe' inline assembly.
3.Established strict policies for memory access.
4.Curated an allowlist of permissible instructions.
5.Set rigorous test criteria and 'done' conditions.
In short, with the overall guardrails in place, a sub agent loop was run, and this level of code was produced. This raises a number of interesting points about how we should use AI. I haven't looked at all the code, but the idea of passing assembly through safe zones without memory access, and using that as a foundation to achieve this level of implementation through AI, is quite impressive
Edit to add: If I'm understanding this correctly we should be able to run this against projects and detect asm violations, I feel like this would be very valuable to be able to feed these back to maintainers
3a. rdpru (similar issues to cpuid) and rdpmc perhaps surrounded with lfence or cpuid inside the same assembly chunk
For obvious reasons, this is somewhat niche and may not even make it into production code, but it’s also important when you do need it. It’s also memory safe. I guess in such cases you’d use fast C rather than Fil-C though.
4a. rseq
Probably even less feasible than atomics TBH, as such blocks will usually also contain control flow (at least that implied by to the nature of rseqs).
> Before the advent of AI, writing a parser for x86_64 assembly would have been such an annoying task that I might have never gotten around to implementing support for memory safe inline assembly [...].
It is annoying, but even before the advent of AI that didn’t stop the developers of TCC for instance.
With that said, given Fil-C is Clang/LLVM-based, shouldn’t an assembly parser, at least, be already available somewhere? I was under the impression that Clang (unlike GCC) actually parsed asm blocks.
> This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst).
Not quite. AIUI, the first is just a barrier for the compiler, while the second is also a CPU memory barrier. Godbolt seems to confirm that.
I mean one that infers as much context as possible and tries to help as much as possible.
This has to be assembler specific of course. For example, I use fasm which has higher level macros. An LSP could suggest struct fields and other stuff.
Inline asm should take 10x or more effort compared to writing the surrounding c++ code and should be tested with protected pages at the edges if possible. It should always have assertions before/after that check invariants too.
Also there are at a lot of cases that this won’t work. One example is implementing strlen using avx512 where you want to align the address down to a multiple of 64 and run until the end of the page, so you can do simd while avoiding segfault.
Another example is just handling loop remainders with masking in avx512.
Also it is pretty naive to think an LLM got this right
Overall it seems like a huge waste of time.
If you are writing inline asm and want to make it better, just get as many LLMs or, even better, humans to review it. LLMs are really good at finding mistakes in inline asm, with a high false positive rate though, so you have to understand the concept.
For example one bug I had was about not consuming the inputs before writing to the outputs. Compiler can assign the same register to input and outputs unless outputs are marked with & (or something like that). It was super frustrating to debug this until I asked an LLM and it found the problem.
I would guess for the use-case of "I have a C project and I want to run it in Fil-C" the ability for this to be a warning + run-time panic is very helpful for quickly getting started. Reminds me of GHC's -fdefer-type-errors.
I agree that I wouldn't want to deploy a program where those panics are reachable*, but it's still handy for local development and/or maybe the developer knows they aren't reachable.
I haven't checked, but I'd guess there's a warning and a -Werror -style flag to opt-in to having a hard error for unsafe assembly?
* Obviously a panic is better than not. But guaranteed safeness is better than either of those.
Anyway, this is also very useful for humans to use, so it's mostly a lovely coincidence this level of safety arrived with useful chatbots.
The quote uses atomic_signal_fence.
If you find a way to bypass my checks, file a bug. I tried very hard to break it. My agent loops tried even harder
There was some debugging thing where it embeds debug info using module level assembly that you have to disable.
So currently most of those still have the hacks to go down the no-inlineasm path when building with Fil-C
For the few where I reinstated the inline assembly, there were no bugs found.
It would be a good experiment to try to reinstate the inlineasm paths in all of the programs that had them. I suspect there’s a low chance of finding a bug if it’s in inline assembly that’s on the critical path.
Let's say I compile curl using Fil-C, and later an exploitable memory bug is found in curl. The implication here is that my fil-c-compiled curl will crash safely, rather than be able to be exploited? And the "cost" to me is that my curl executable will be slower than the standard one?
I might give it a try when I have a chance, I'll let you know if anything comes of it.
What happens if you ask to find the strings that will erroneously return True from validateSafeInlineAsm for disallowed asm? :)
Example of a bug found most recently was that sahf was allowed without a cc constraint.
Anyway, if you find bugs, file them. Would be fun to see if there’s a case me and my agents missed
NOTE: This is a pre-release feature. The Fil-C 0.679 release does not ship with this feature. To test this feature, you need to build from source.
GCC and clang both support an incredibly powerful inline assembly syntax. For example:
unsigned rotate(unsigned x, unsigned char c)
{
asm("roll %1, %0" : "+r"(x) : "c"(c) : "cc");
return x;
}
Instructs the compiler to emit assembly based on the roll %1, %0 template, where %1 is filled in with %cl, %0 is filled in with whichever register holds x, and c is moved into the %ecx register just before the roll instruction. Additionally, the compiler is told that the instruction will change the value of x and change the value of control flags.
This seems like it cannot possibly be safe! What if the programmer did something wrong, like omitted the + in "+r", or forgot the the "cc" clobber? In Yolo-C, if you make such a mistake, the compiler happily miscompiles your code in those cases.
Yet Fil-C supports this inline assembly syntax and it's completely safe!
This document explains why Fil-C supports inline assembly at all and then goes into the details of how that support is achieved while maintaining both programmer intent (you still get the assembly template you asked for) and complete memory safety (if you do something wrong, you'll panic or get an illegal instruction trap, at worst).
While reviewing folks' C and C++ code, I've found the following reasons for inline assembly, where 1 is most common:
Blank inline assembly to prevent compiler analysis. This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst). It works because we're telling the compiler that the inline assembly clobbers all memory, which forces the compiler to serialize memory accesses, just like a signal fence would have. The contract with the compiler is clear: the compiler must emit exactly the assembly we're asking it to emit (which is blank here) without second-guessing our claims about the clobbers. That is, the compiler must not infer that because the assembly is blank then there cannot be a memory clobber. We said memory clobber, so that's what the compiler sees. Similarly, folks do stuff like asm("" : "+r(x)). This means: the assembly may read and then write x. The assembly is blank, so this incurs no cost other than forcing the compiler to assume that it doesn't know anything about x's value after the assembly executes. This kind of data flow fence is useful for writing constant-time crypto. Fil-C has long supported blank inline assembly since it's trivially safe. Fil-C even supports "+r" constraints on pointers, in which case both the intval and lower are threaded through their own "+r"-like constraints at the LLVM IR level.
cpuid and xgetbv. The inline assembly snippets for these two instructions occur most often in code that then goes on to use SIMD intrinsics. I think this is because the __get_cpuid API in cpuid.h is confusing to use and, as far as I can tell, does not work right in either GCC or clang. Hence, packages like zstd, simdutf, simdjson, and other SIMD-using programs tend to identify CPU features by using inline assembly that invokes cpuid. They often also use inline assembly to invoke xgetbv as well. In Fil-C, __get_cpuid is fixed, so you could use that, and zxgetbv is offered as an intrinsic. However, it's better to support those inline assembly snippets without requiring folks to change their code! And there's nothing unsafe about invoking cpuid and xgetbv so long as the code specifies the right clobbers and constraints.
Arithmetic over secrets in crypto code. A great example is OpenSSH's sntrup761 implementation, which wraps key arithmetic in inline assembly to ensure that it gets exactly the right instruction and not some instruction that might have varying execution time depending on inputs. Note that this kind of code often has fallbacks to try to get the compiler to emit constant-time code even if inline assembly is not supported, but those fallbacks are unlikely to be as rigorously validated, and often rely on "optimization blocking" idioms that hurt performance and could be circumvented by a sufficiently clever compiler. Hence, it's safest to support inline assembly snippets that do this. Luckily, these snippets are also completely safe, provided that the constraints and clobbers are correct.
Atomics. Compilers have long supported intrinsics for atomic instructions. Compilers also have a long history of implementing these intrinsics incorrectly! Most recently, clang had bugs in how it lowered CAS to LL/SC on ARM64. Hence, serious lock-free programmers tend to write their atomic instructions using inline assembly at least some of the time, like in those cases where they had encountered a miscompile and so dropping to assembly was their only path to fixing the bug. Supporting atomics in inline assembly would require allowing inline assembly that accesses memory, which would mean somehow inferring what Fil-C bounds checks to do. Inline assembly that accesses memory is currently out of scope. However, memory-safe inline assembly does support fences (lfence, sfence, mfence, and serialize).
System calls. These are currently out of scope for inline assembly in Fil-C, and that's fine, since using inline assembly for syscalls is only necessary in the guts of libc implementations. Fil-C already has ports of musl and glibc, and in both cases the inline assembly for syscalls is replaced with calls to the pizlonated_syscalls.h API that Fil-C provides. However, I can imagine adding support for inline assembly that does syscalls in the future, to make it easier to port new libc's to Fil-C.
x87 long double functions. If you're working with long double on x86, then you're using the x87 80-bit floating point math. If you want access to the x87 FPU's implementations of various math functions, then often the best way to do that is to drop to inline assembly. This is totally safe, provided that the inline assembly doesn't push or pop the x87 stack, and the constraints correctly spell out which x87 stack registers were clobbered.
It's likely that folks use inline assembly for other purposes, but the above list is all that I've seen when surveying programs in the Linux userland.
To summarize:
Read on for details about the world's first memory safe inline assembly implementation!
When the Fil-C compiler's safety instrumentation pass (called FilPizlonator) runs, inline assembly is present in LLVM IR as a pair of strings:
The assembly string, almost exactly like it appears in the C source code, just with some characters replaced. For example, the roll example turns into roll $1, $0.
The constraint string. This uses an LLVM-specific syntax to express the constraints and clobbers. For the roll example, this is =r,{cx},0,~{cc},~{dirflag},~{fpsr},~{flags}.
Hence, we can validate if an inline assembly expression is safe by:
Parsing and analyzing the assembly. If it contains memory accesses, control flow, or anything we don't recognize, we reject it.
Parsing and analyzing the constraints. If those do anything we don't recognize or support, then reject.
Ensuring that the assembly's effects are fully captured by the constraints. For example, if an assembly instruction modifies a register, then the constraints must capture that register mutation. If any instructin sets some CPU flags, then those flags must be listed as clobbers.
Before the advent of AI, writing a parser for x86_64 assembly would have been such an annoying task that I might have never gotten around to implementing support for memory safe inline assembly other than the trivial kind (where the assembly is blank).
But now, implementing a feature like this is as simple as writing a good prompt! The next section has my original prompt that I used to start work on this feature. I fed it to my own private agent harness (called T800) running with Kimi K2.7-code.
Let's add more support to Fil-C for safe, harmless inline assembly!
Please read T800.txt, README.md, and https://fil-c.org/how to understand the context of what we're doing.
Fil-C currently rejects all inline assembly except for trivially safe stuff like:
asm volatile ("" : : : "memory")
Or even:
asm ("" : "+r"(x))
Basically, Fil-C accepts inline assembly if the assembly string is blank, and goes to great lengths to handle the case where the inline assembly snippet has a variable threaded through it. This kind of thing is very common, since it allows programmers to conceal data flow from the compiler to inhibit optimizations, which can be important for things like constant-time crypto.
Let's take this further to support cases where the assembly snippet is not empty, but is still harmless!
Here are examples that should work:
__asm__ ("sarw $15,%0" : "+r"(crypto_int16_x) : : "cc");
Or:
asm volatile("cpuid\n\t" : "+a"(a), "=b"(b), "+c"(c), "=d"(d));
Or:
asm volatile("xgetbv\n\t" : "=a" (xcr0_lo), "=d" (xcr0_hi) : "c" (0));
These examples are safe because:
sarw, cpuid, and xgetbv have no meaningful side effects other than setting registers."+r" involve threading the crypto_int16_x variable through the assembly invocation as data flow and this will not turn into a memory access unless the variable is spilled (which is fine - spills are totally legal in Fil-C, and the spills are in a part of the stack that Fil-C cannot get a pointer to).asm.sarw example, we are letting the compiler pick the register.asm modifiers correctly list clobbers for all of the registers clobbered by the instruction.Note that these three examples look like this in LLVM IR. The sarw one is:
%0 = call i32 asm "sarw $$15,$0", "=r,0,~{cc},~{dirflag},~{fpsr},~{flags}"(i32 %x) #3
The cpuid one is:
%0 = call { i32, i32, i32, i32 } asm sideeffect "cpuid\0A\09", "={ax},={bx},={cx},={dx},0,2,~{dirflag},~{fpsr},~{flags}"(i32 undef, i32 undef) #5
The xgetbv one is:
%0 = call { i32, i32 } asm sideeffect "xgetbv\0A\09", "={ax},={dx},{cx},~{dirflag},~{fpsr},~{flags}"(i32 0) #5
It would be great to support any inline assembly that meets these criteria. To do that, we need to integrate the following into llvm/lib/Transforms/Instrumentation/FilPizlonator.cpp's handleInlineAsm function:
sarb, sarw, sarl, and sarq, which are all the same instruction but with different word sizes. And sar, where the word size has to be inferred from operands.cpuid clobbers ax/bx/cx/dx.cc and if they do, make sure that the assembly constraints also lists cc as clobbered.={ax} wasn't part of the constraint when using cpuid)Make sure that if you reject inline assembly, then handleInlineAsm returns a nice Reason that explains why.
You should reject InlineAsm that doesn't use the AT&T dialect.
Note that your assembly parser doesn't even have to know how to parse any assembly that isn't allowlisted. I think that means that you don't even have to implement parsing of memory operand syntax or any instruction mnemonic that's not in the allowlist!
For now, add support for:
For examples of assembly snippets that should work, take a look at projects/openssh-10.3p1/sntrup761.c. Note that this file currently has a #undef __GNUC__ to prevent the inline assembly from being used. Note also that this file has C implementations of all of the inline assembly. So, I recommend creating a filc/tests test that has all of those inline assembly snippets and they are tested against their C equivalents for a variety of inputs.
Also be sure to create lots of tests for each allowlisted instruction that check that we reject unsafe uses of inline assembly (memory operands etc). And create tests for instructions that are either obviously unsafe or not yet supported to make sure we reject those. Note that the rejection will be runtime so the manifest for the test should say that the result is failure with the output including a filc safety error. There might be some existing tests that assert such a failure for inline asm that you will make succeed, since those tests might be using one of the instructions I'm requesting that your allowlist. In that case, just fix those tests' result expectation in their manifests.
If you're unsure about any x86 instructions, remember that there's https://www.felixcloutier.com/x86/
I recommend breaking this task up into steps handled by separate subagents:
./build_clang.sh) to test that but it might not pass filc/run-tests and ./build_base.sh might fail, since we might pass through assembly permissively or unsoundly. you should try to feed it some code manually via build/bin/clang -c testfile.c to see if the parser is at least not crashing, and you can add llvm::errs() print statements to print out what the parser saw and whether it worked (but disable those print statements, or put them behind if (verbose) after you're done)handleInlineAsm (for example, I don't think that parser will currently handle {ax} or ={ax}). after this, ./build_clang.sh should still build, but it might not work right (tests might not pass, build_base.sh might not work). again, you can test this with print statements and trying to run the compiler in -c mode on a standalone simple file../build_clang.sh should still build, but it might not work right. However, I would expect that by this point, a test of the instruction sequences from sntrup761.c should pass. Note that you SHOULD NOT ./build_base.sh to run this test; just ./build_clang.sh, since build_base.sh runs the compiler on a lot of stuff that might still not work at this stage. You can use filc/run-tests -t <testname> to run the test at this stage.filc/run-tests -t../build_base.sh builds. Grind on any failures you find until it builds.filc/run-tests passes. Grind on any failures you find until it passes.Based on the above prompt, T800 wrote a pretty good initial implementation, including a healthy amount of tests. The C++ code that it added to FilPizlonator is all in a new function called validateSafeInlineAsm, which contains an assembly parser and assembly static analysis.
I then validated that this works by writing some tests by hand and removing the #undef __GNUC__ from sntrup761.c. I also reverted cpuid changes to zstd and simdutf, since it's now OK for them to use their original inline assembly for CPU identification.
It's worth calling out the oddest part of Fil-C inline assembly: if you get it wrong, then there is no compile-time error. Instead the inline assembly snippet turns into a Fil-C panic or an illegal instruction trap at runtime.
If FilPizlonator determined that the inline assembly is not safe, then it'll replace it with a Fil-C panic. That panic will provide diagnostics about why the assembly was rejected.
If the instruction was safe, but your CPU doesn't support it, you'll get an illegal instruction trap. This is possible because there are lots of instructions recognized by FilPizlonator that are not supported by all x86_64 CPUs. Illegal instruction traps are safe because Fil-C provides no facility for catching them. For example, a sigaction call to register a handler for SIGILL will return ENOSYS. Hence, this is just a panic, but with with fewer diagnostics.
Using runtime panics has the nice property that inline assembly in dead code doesn't get in the way of porting software to Fil-C. Also, it's consistent with how Fil-C usually reports errors.
Finally I built a loop to implement every safe pre-AVX512 instruction.
It's worth dwelling on what a loop is, since lots of folks talk about looping without necessarily explaining what they mean. Most agent harnesses have the ability to spawn subagents. T800 is based on this architecture, but so are many of the publicly available agents. Hence, the key is to tell the agent that you want it to keep doing something by spawning subagents until it is done, with a crystal-clear criterion for what done looks like. Each subagent does a subtask and reports back. The toplevel agent decides what to do based on its understanding of what the subagents have done so far.
To this end, I had T800 create an instructions_list.txt file that contains all of the X86_64 instructions with either no annotation (if it hadn't been considered), a REJECT annotation if we rejected it, or ACCEPT if we accepted and implemented it. Then I told T800 to write a script to find the first not-yet-considered instructions in that file. These first two steps took very little time; they were just the groundwork. Finally, I told T800 to keep spawning subagents that use that script to find an instruction and then implement it until they could not find any more instructions.
Hence, the loop here is English prose that the agent takes as instruction, and those instructions lead the agent to spawn subagents. Those subagents are prompted to perform a task by the toplevel agent, not by me directly. The objective here is to get the human (me) out of the business of repeatedly telling the agent what to do, since that's exhausting. My loop instructions did include the following: if the agent detects a file called instructions_stop, then it should stop looping and instead move to the terminate phase of T800, where it performs a review/judge loop to check its work, and then stages everything for me to commit it. I did this maybe twice a day, so that I could sanity check what is happening and run some tests myself.
For the first half of the looping, I used Kimi K2.7-code, but then I switched to GLM 5.2. Interestingly, I found that Kimi K2.7-code is more paranoid; it interpreted my instructions as requiring more tests. GLM 5.2 was faster and more brave. That said, most of the super hard groundwork (including supporting static analysis of x87 instructions and their constraints) was done by Kimi, so maybe the greater paranoia I observed was due to the fact that Kimi did the heaviest lift.
It didn't take long for all of the safe pre-AVX512 X86_64 instructions to be implemented along with a plethora of tests to cover both the good case of those instructions and the bad case (which causes a Fil-C panic). This happened while I was away from the computer doing other things (like replaying Witcher 3 and porting Fedora patches for quantum crypto support in OpenSSH, which I did by hand).
As far as I know, Fil-C has the first ever implementation of memory-safe X86_64 inline assembly. It supports hundreds of instructions, including useful x87, SIMD, bitmath, flags management, and fence instructions. Basically anything that is safe within the Fil-C garbage-in, memory safety out model.