An adveserial package can claim to have a 255 tagged integer but not actually have any followup, tricking the payload parser into an incorrect offset and reading straight off into followup memory.
It's a classic thing to check for when dealing with variable length strings or binary, but it may not cross the mind when it's hiding in the Bijou64_decode(*buff, *cr) function.
The downside is the encoding size. LEB128 quickly grows to 2 bytes, but stays at 2 bytes all the way to 2^14. This is important if you're using these numbers as tags/identifiers as we were in the multicodec [1] project, or for network message lengths. bijou64 only gives you 500 <= 2 byte numbers.
The problem is linking: a compiler needs to emit code into independent translation units, which contain "missing" references to symbols in other translation units, without yet knowing where all the code will end up in the final executable. Since we don't know where the location of other code is yet, we don't know how big the number representing that location is yet, which means that we don't know how wide the variable length encoding of that number will be. If the width changes after linking, then we have to push around the surrounding code to make space for the wider integer. Unfortunately, this changes the location of all the surrounding code, so we have to recompute all the references!
The solution is to always emit un-linked var ints in the widest possible encoding (5 bytes for LEB128) that way when the references are patched during linking, no code is moved around. All integers can be converted to a non-canonical 5 byte form that is "wasteful" but its a worthwhile tradeoff because it solves this issue. Other integers that don't need to be linked can be packed in a smaller var int form to save space.
vu128: https://john-millikin.com/vu128-efficient-variable-length-in...
metric/imperial varint: https://dcreager.net/2021/03/a-better-varint/
vectorizing VByte: https://arxiv.org/abs/1503.07387
https://news.ycombinator.com/item?id=44456073 - Corrected UTF-8 (2025-07-03, 54 comments)
This "corrected UTF-8" has other problems, but I thought it's interesting how the shifted-offset idea carries over.
This looks neat, but if encoding/decoding performance is important, payload size isn't and the integer is bounded, I would just put a fixed-size integer into the payload as-is.
LEB128 (and JSON for that matter) can encode integer values of arbitrary length. This doesn't, which may or may not be important but it's different.
I'll admit that I do not do any cryptographic work with my library and therefore canonical representations aren't a huge concern in my use-cases. I merely provide various configurable limits (max value length, max depth, max items per collection) in an effort to prevent infinitely long documents from hogging my tokenizers indefinitely.
[0]: https://www.sqlite.org/src4/doc/1433690d7b/www/varint.wiki
> This causes problems for signed data if you ever want to do things like compression since you need to know the exact bytes that were signed.
If you are verifying a signature by taking some logical data structure, turning it into a byte string, and calling the verification primitive on those bytes, you likely have a design error. You should instead collect bytes, verify the signature, and then parse the bytes after verifying the signature. And remember to include enough context in those bytes so a different message signed for a different purpose by the same key doesn’t confuse you.
But testing proved that when you move to SIMD instructions, ULEB128 (https://github.com/kstenerud/bonjson/blob/main/bonjson.md#ty...) or sentinel values (https://github.com/kstenerud/bonjson/blob/main/bonjson.md#lo...) win every time because of the parallelization opportunities.
The true irony is that even SIMD text parsing would outperform this! SIMD is that powerful.
The upsides: the size of the integer is apparent upon reading the first byte, and every number has exactly one canonical representation. I wish C strings had been standardized around something similar, instead on null termination.
> ...adversarial input, which is rarely in the test suite.
This made my scratch my head. My tests for quite pedestrian APIs often contain adversarial input of obvious shapes. I though that for anything security-related (like the author's project) testing against adversarial input would be be a prominent part.
Values 0-127 are a single byte, but if that first byte has the continuation bit set, not only does that indicate the next byte has 7 more bits to contribute, it also moves the base up to the next window.
10000000 00000000 is the only way to represent 128.
10000000 10000000 00000000 is the only way to represent 16512.
Does this encoding have a name?
If you don't do this properly, you end up with things like: - SAML XSW attack due to XML signature wrapping - ASN.1 BER/DER signature forgery - Bitcoin transaction malleability attacks
And say you have it as part of some other data. If you want to be able to hash it by the raw memory bytes, many different ways to represent a number becomes a problem.
In a contrived example of a pbuf {length:int, payload:byte[1]}
LEB128 can trick you into reading the payload as part of the length, but then hopefully trigger a code check against invalid buffer read. (or one byte outside the struct if the payload is also malicious)
Binou64 can trick you to read 7 bytes into other memory, before any buffer size validation is done.
It's then not uncommon to log with a helpful; "buffer with length: 26624894573377(7 bytes of stolen data) is invalid", or just crash.
It's to the point that Bijou64_decode should perhaps take "end_adress" or "max_read" to catch this kind of attack.
(If you dont validate a malicious pbuf, you're in for a bad time regardless of integer format, but these int formats add their own way to trigger a buffer overrun despite a proper check.)
1: https://kizu.dev/svg-linked-parameters-workaround/ 2: https://www.seaofclouds.com
The first is what they describe here: as an attack. It's like why would anyone ever overflow a buffer with shellcode.
The second is that they are implementing a spec that requires appending a varint length-prefixed field to a buffer but don't really care about the space optimization, don't know the field's length when they start appending it, and don't want to put the field into a second, temporary buffer or slide it down into place. https://github.com/FFmpeg/FFmpeg/blob/468a743af1653a08f47081... vs say my own code which does the slide: https://github.com/scottlamb/retina/blob/6972ac4261ce7bf5b58...
It's uncommon but I've definitely seen it done (with media containers like Matroska, not actually LEB128) in extremely high-throughput systems that can't spare any cycles.
It might be slightly more instructions than some other serial VL (variable-length) integer codec choices, but overall I don't think it's more difficult.
The very efficient SIMD VL codecs tend to stripe (separate) the control and data bits, so they're in a different design space anyway.
If you can choose a fixed number of bytes for the length prefix, you can skip that number, do the encoding and find out the length, and then come back and fill in the length-prefix after.
But you actually don't know how many bytes it will take without doing all of the work to know the payload length (since larger payloads take more bytes to represent the length).
If you allow overlong representation you can reserve a few bytes and sometimes it'll just be the effective no-op bytes. If you don't, you won't be able to.
I happen to be guilty of a variant of this, where I don't bother emitting a 16-bit floating point number instead of a 32-bit one in my CBOR encoder even if it can be represented exactly. That one is laziness.
They might have a different definition of adversarial than you.
> My tests for quite pedestrian APIs often contain adversarial input of obvious shapes.
This doesn't seem like what I would call adversarial.
This seems like standard negative testing or boundary value analysis - which I would be shocked if they didn't do.
I'm not saying SQLite's varint implementation is ideal for every application. It's just an implementation that is one of the most used implementations, if not the most (I'd bet it is by a large margin though). It just seemed like a missed opportunity to compare it with the implementation they landed on.
EDIT: Just wanted to add, thanks for sharing that link. Interesting!
ULEB128 works in SIMD because there's only one dependent bit per byte, so you can speculatively decode and then correct later cheaply. Bijou requires you to check the first byte and then branch based on the value using all 8 bits in the decision matrix (to handle branches 0-247, 248, 249, 250, 251, 252, 253, 254, 255). This absolutely DESTROYS any parallelization opportunities.
Not to mention that non-canonical sized ints (3, 5, 6, 7) have abysmal performance compared to unaligned 2, 4, and 8 byte reads on modern processors.
I spent WAYYYYYYYY too much time exploring this...
Even though decoding the lengths must be serial (since's there's no unambiguous way to differentiate a tag and data byte), it's still doable within the wider SIMD registers, so there's some theoretical efficiency gain to be had (depending on the shape of the data).
On a general note, the continuation bit and prefix byte forms are equivalent, you just broadcast the prefix byte and compare against an increasing vector to convert it to a mask. Yeah, there's probably more fiddly SIMD if there are multiple prefixes in the register, but doable (it's just not byte-parallel, you eg. unroll the serial decode loop 8 times or whatever your maximum output byte width is, and mask out).
Simplified:
// Just maps a byte to its position in the register
__m128i idx = _mm_setr_epi8(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
// Broadcast the prefix
__m128i nn = _mm_set1_epi8((char)prefix_byte);
// Get applicable locations: prefix_byte contains the length, if byte_pos < len, the corresponding byte will be set
__m128i m = _mm_cmpgt_epi8(nn, idx);
// If you *really* want a high-bit mask:
m = _mm_and_si128(m, _mm_set1_epi8((char)0x80));Interleaved Bijou has no such signal (tag and payload bytes both span 0x00–0xFF), so finding the boundaries is a dependent per-value walk with no opportunities for parallelism.
Either way, a properly written decoder (and it's like ten lines) should really not have any problems with it. I was agreeing with you.
Edit: to clarify, I was talking about the author's argument being strange, not yours.
Edit: a properly written decoder is a lot more than 10 lines if you properly deal with integer overflow and both signed and unsigned ints.
It’s nice when you work on security and accidentally get some performance for free. This is the story of a small encoding called bijou64 — a variable-length integer (varint) encoding that we developed for the Subduction CRDT sync protocol. It was intended to fix a subtle signature-verification bug by making each number only representable a single way. It turned out to also run a few times faster than the more common varint LEB128.
We didn’t set out to write a fast varint, but it turns out that our design constraints made for an encoding that has to do less work.
Many binary protocols need a compact way to encode integers that are usually small but occasionally large. Variable-length integer encodings (“varints”) solve this, but most designs treat canonicality as an afterthought — something enforced by a runtime check in the decoder rather than by the structure of the encoding itself.
Since it’s the most common varint, we’re going to pick on LEB128 a bit here. I want to emphasize how much LEB128 is a great choice for many projects, and the reasons that it was not a good choice for us also applies to the other formats that we looked at. It just happened to not be a perfect fit for our use case.
LEB128 encodes a number as a sequence of 7-bit segments, with the high bit of each byte signaling “more bytes follow” (there can be many such continuation segments, though we only show 2 segments below). This lets you avoid always writing 8 bytes (64 bits) even when you’re representing a small number (which would be mostly zeroes). This is like writing 5 instead of 000000005 just to get the correct number of characters. Putting aside that working in 7-bit is odd, this is a practical solution!

LEB128 layout
But there’s a problem: the number 0 can be encoded as the single byte 0x00, but it can also be encoded as 0x80 0x00. Or 0x80 0x80 0x00. Or any longer sequence of 0x80s ending in a zero byte. 0x80 is 1 0000000, so you can have as many of those as you want and still get 0! Most LEB128 decoders will happily accept any of them. This isn’t unique to zero; nearly every number in LEB128 can be represented many ways.

Zero represented two ways in LEB128
This causes problems for signed data if you ever want to do things like compression since you need to know the exact bytes that were signed. Having an extra 0x80 will result in a different signature.
If you only have one unique way to represent a number, you can do things like deduplicate runs of numbers when storing without needing to retain the entire original.
One solution is to simply enforce a special “canonical” form. While encoding a varint, you must ensure that you use the canonical encoding (not all libraries will always do this). When decoding, you must validate that it matches your expected format. It would be really nice to not have to do that additional work.
You could reasonably ask: who actually shows up with adversarial varints? The answer is “anyone who would benefit from your protocol mistaking two different byte strings for the same value.” For a signed protocol, that could be a lot of people.
While not about varints per se, the textbook case is ASN.1 (the abstract syntax notation behind X.509 certificates, LDAP, and many other things that are widely depended on). Canonicalisation attacks have been used against PKCS#1 v1.5, Mozilla NSS, GnuTLS, JWTs and Bitcoin transactions.
The pattern in all of these is something like:
ifs that enforces this.bijou64 eliminates the possibility of more than one encoding per integer. Just like our normal written number system, there is exactly one way to write each number. Bijou uses two tricks:
The first byte represents 0–247 as normal. If you get 0x42, it just decodes to 0x42. 248–255 switch into a different mode: they’re a tag for how many bytes to expect after this first one, which will represent the number. This is really nice for decoding, because we know how much memory to allocate as soon as we read the first byte (O(1)). LEB128, by contrast, has to keep reading bytes until it sees one without the continuation bit set (O(n)).

Byte/tag structure
The tag alone is not enough to guarantee canonicity, but it points the way: instead of repeating 0-247 in the second byte (0xF8 0x00 == 0x00), we offset the next byte by 248 (0xF8). This means that 0xF8 0x00 == 0xF8 == 248, not 0 (since 0 is already represented as 0x00).
Here’s a worked example of decoding 1738, which takes the tag plus two bytes:

Worked example
All numbers at this length (3 bytes total AKA tag + 2 data bytes) are offset by 504 (0x1F8). Each successive length increases the offset in a predictable pattern. See if you can spot it:
| Total Length | Offset |
|---|---|
| 1 | 0x00 |
| 2 | 0xF8 |
| 3 | 0x01F8 |
| 4 | 0x0101F8 |
| 5 | 0x010101F8 |
| 6 | 0x01010101F8 |
| 7 | 0x0101010101F8 |
| 8 | 0x010101010101F8 |
| 9 | 0x01010101010101F8 |
This is a lookup table based on the first byte!
There is one exception: because the numbers are always offset down, the largest values (9 bytes) need a manual check that they are in bounds. The 9-byte (tag + 8 bytes of data) slot can actually represent numbers larger than 2⁶⁴ due to the offset, but since bijou64 targets u64, we cap it there. This isn’t the canonicity problem from earlier — every in-range number still has exactly one encoding — we’re just clipping extra headroom that we don’t want. So when the tag is 255 (the biggest one), the decoder checks that the value is below the cutoff.
Bijou64 has to do all of this bit shuffling, tag lookups, and whatnot. There needs to be a cost for all of this, and there is over the regular, fixed length 64-bit numbers. Surely it’s slower than much widely used encodings like LEB128 and the very clever vu128, right? We benched it on ARM (Apple M2 Pro) and x86 (AMD Zen 5), and were not expecting what came back.
Median decode time per batch of 4096 values. Lower is better. Distributions described in the methodology.
bijou64 comes out as pretty fast!
It decodes roughly 2-10 times faster than LEB128 even when we don’t take the overhead of canonical checks on LEB128 into account. Small numbers (which encode to a single LEB128 byte) are about twice as fast in these benchmarks. Larger numbers, which force LEB128 to scan continuation bits across many bytes, are around 8–10 times faster. On a uniform full-u64 distribution — about as adversarial as a benchmark gets — bijou64 processes a batch of 4096 values in ~3 µs (≈0.75 ns per value) where LEB128 takes ~30 µs (≈7.3 ns per value).
The bar chart only shows medians, though. The CDFs underneath are where the variance story lives:
Decode times per batch of 4096 values, plotted as a CDF for each library × distribution cell. Curves further to the left are faster. Curves that rise more steeply have more consistent performance.
In these benchmarks, bijou64’s CDFs are nearly vertical — every recorded batch timing sits within a tiny band around the median. LEB128’s curves lean over and trail off to the right, because the continuation-bit scan length depends on the value, and the branch predictor never gets a chance to lock in.
As you might imagine, the delta is even larger when doing canonical decoding since bijou64 gets this “for free” thanks to its encoding for all but the largest numbers:
Canonical decode time per batch of 4096 values — that is, decode plus the runtime overlong-rejection check for libraries that need one.
In bijou64, canonical decoding is decoding — the canonicality check is the format. In the others, the canonicality check is additional work.
Encoding is also generally faster, with one exception:
Median encode time per batch of 4096 values.
On the “small” distribution (248 – 65,535), LEB128 wins by about 1.24×.
bijou64 isn’t the most compact varint for every distribution. Tier-boundary differences aside, bijou64 and LEB128 produce wire-byte counts within a couple of percent of each other across realistic workloads.
Length to represent certain numbers. These were chosen specifically to show where they differ; the vast majority of numbers are identical in length.
In hindsight, this kind of makes sense:
Since there is no continuation-bit scanning. The decoder knows immediately how many bytes to read; the encoder knows immediately how many to write. LEB128’s decoder has to scan the high bit of every byte until it finds the terminator. Branch predictors love the bijou64 pattern; they hate the LEB128 one, especially for large values where the continuation chain is long.
The payload is a single contiguous big-endian integer, not 7-bit chunks with bookkeeping bits sprinkled through. Modern CPUs have byte-swap instructions for exactly this; the compiler turns the read into a single load + bswap. LEB128’s 7-bits-per-byte layout forces the decoder to mask and shift every byte.
Tier selection is a small fixed match. For any one workload, branch prediction settles into a stable pattern almost immediately — exactly what those steep CDF curves show.
Adding OFFSET[tier] is a constant load and an add (or sub if you’re encoding). A previous version had an if and some branching, but the arithmetic version actually has fewer instructions in the hot path on most modern CPUs.
Maybe! Like most interesting questions, it depends on your goals. This is a brand-new format, and it hasn’t been nearly as battle-tested as LEB128, which has mature, well-exercised implementations in every major language. Our benchmarks are encouraging, but big claims need big evidence — and we’ve only tested on three CPUs so far (M2 Pro and Zen 5 are in the published benchmarks; a Zen 3 we tried looks similar to Zen 5).
LEB128 isn’t going away, and it shouldn’t. But if you’re designing a new format and canonicality matters — for signatures, content-addressing, or any kind of “two implementations must agree on the bytes” property — there’s an alternative that’s structurally safer and runs faster on every benchmark we threw at it.
The library is published as bijou64 on crates.io, dual MIT / Apache-2.0, and the spec is CC BY-SA 4.0 if you’d like to port it. A Wasm/JavaScript wrapper exists, and a family of width extensions (bijou32, bijou128) is sketched in the spec’s future-extensions section. If you find a bug — or, more interestingly, a workload where bijou64 loses to something we benchmarked against — we’d love to hear about it!