32-bit block ciphers are a good way to create short opaque IDs because they provide a bijection between two sets of integers. And even if your ID is slightly shorter than 32-bit you can easily shave off a few bits with cycle walking: https://en.wikipedia.org/wiki/Format-preserving_encryption#F... E.g. if you want to make sure your IDs can be mapped into 31/63 bits.
I especially like the RC-5 cipher for these kinds of uses. It can be implemented in just a few lines of code and there are standard test vectors for it.
If I were a nation state actor, I'd just store the encryption keys supplied to the AES CPU instruction somewhere and in case the data needs to be accessed you just read the stored keys.
No need to waste time deploying a backdoored CPU firmware and wait for days or weeks, and then touch the hardware a second time to extract the information.
When all AES encryption keys are already stored somewhere on the CPU, you can easily do a drive-by readout at any point in time.
Linux kernel has a compile time flag to disable use of custom CPU instructions for encryption, but it can't be disabled at runtime. If "software encryption" is used, the nation state actor needs to physically access the device at least two times or use a network-based exploit which could be logged.
>However, they can be very useful against passive adversaries whose capability is limited to observing identifiers, who are then unable to map them to the original value.
Really? Isn’t the Sweet32[0] attack mostly passive? “We show that a network attacker who can monitor a long-lived Triple-DES HTTPS connection between a web browser and a website can recover secure HTTP cookies by capturing around 785 GB of traffic.”
The storage required for this would be humongous and the CPU cannot know for which data the keys have been used. Moreover this would too easily be defeated, because even if the AES instructions allow to specify a derived round key in them, you can always decline to do this and use a separate XOR instruction for combining the round keys with the intermediate states. Detecting such a use would be too difficult.
No, there is no base for fearing that the AES keys can be stored in CPUs (on the other hand you should fear that if you store keys in a TPM, they might never be erased, even if you demand this). The greatest possible danger of adversarial behavior of a CPU exists in the laptop CPUs with integrated WiFi interfaces made by Intel. Unless you disconnect the WiFi antennas, it is impossible to be certain that the remote management feature of the WiFi interface is really disabled, preventing an attacker to take control of the laptop in a manner that cannot be detected by the operating system. The next danger by importance is in the computers that have Ethernet interfaces with the ability to do remote management, where again it is impossible to be certain that this feature is disabled. (A workaround for the case when you connect such a computer to an untrusted network, e.g. directly to the Internet, is to use a USB Ethernet interface.)
AES also needs only a handful of lines of code for its implementation (using assembly). For such an application, you can even reduce the number of rounds of AES-128, e.g. from 10 to 4.
When you want truly uniform random numbers, then encrypting with AES-128, then truncating, is best. If you want invertible encryption, then you should encrypt a counter and either use a 32-bit addition or a 32-bit XOR for encrypting the 32-bit number. With a single AES-128 invocation for generating a random mask, you can encrypt four 32-bit numbers.
Of course, when speed does not matter, you can use pretty much any of the historical block ciphers, because the security requirements for encrypting 32-bit numbers are very low, since they are easier to find by brute force searching than by attempting to break any kind of encryption.
It is cute, but surely there's a more efficient way than RC5? There are bijective hash functions which are much cheaper (murmur, at least).
Nowadays, there is almost never any reason to use for encryption any other modes of operation except CTR or OCB, which do not expand the size of encrypted data.
That said, the parent article was less about encryption and more about random number generation, which is done by encrypting a counter value, but you never need to decrypt it again.
In RNGs, the block size again does not matter, as the output can be truncated to any desired size.
Moreover, cryptography has many applications, but the most important 3 of them are data encryption, data integrity verification and random number generation.
The optimal use of a cryptographic component, like a block cipher, depends on the intended application.
If you want e.g. 32-bit random numbers, the fastest method on either Intel/AMD x84-64 CPUs or Arm Aarch64 CPUs is to use the 128-bit AES to encrypt a counter value and then truncate its output to 32 bits. The counter that is the input to AES may be initialized with an arbitrary value, e.g. 0 or the current time, and then you may increment only a 32-bit part of it, if you desire so. Similarly for other sizes of random numbers that are less than 128 bit, you just truncate the output to the desired size. You can also produce random numbers that need to have 1 of a certain number of values that is different from a power of two, by combining either multiplication or division of the output value with rejection done either before or after the operation (for removing the bias).
Similarly, for message authentication, if you have some method that produces an 128-bit MAC, it can be truncated to whatever value you believe to be a good compromise between forgery resistance and message length.
For encryption, short data must be encrypted using either the CTR mode of operation or the OCB mode of operation (where only the last incomplete data block is encrypted using the CTR mode). With these modes of operation, the encrypted data can have any length, even a length that is not an integer number of bytes, without any length expansion of the encrypted message.
The advice given in the parent article is not bad, but it makes sense only in 32-bit microcontrollers, because since 2010 for x86-64 and since 2012 for Aarch64 any decent CPU has AES instructions that are much faster than the implementation in software of any other kind of block cipher.
Moreover, for random number generation or for data integrity verification or for authentication, there are alternative methods that do not use a block cipher but they use a wider invertible function, and which may be more efficient, especially in microcontrollers. For instance, for generating 128-bit unpredictable random numbers, you can use a counter with either an 128-bit block cipher function together with a secret key, or with a 256-bit invertible mixing function, where its 128-bit output value is obtained either by truncation or by summing the 2 halves. In the first case the unpredictability is caused by the secret key, while in the second case the unpredictability is caused by the secret state of the counter, which cannot be recovered by observing the reduced-size output.
For applications where a high level of security is not necessary, e.g. for generating 32-bit random numbers, the already high speed of AES-128 (less than 0.5 clock cycles per output byte on recent CPUs) can be increased by reducing the number of AES rounds, e.g. from 10 to 4, with a proportional increase in throughput.
But is Murmur actually bijective?
Imagine that you want to obfuscate your order numbers in the database so that customers can't infer the volume of business by checking the order number.
You can use UUIDs, but you also want to keep the numbers short so they can be dictated over the phone. You can use random IDs, but then you need to lock them in the database during the object creation otherwise you might get collisions.
Or perhaps you have a distributed system and want to allocate a bunch of IDs for each node in advance.
RC-5 with its small block size neatly solves this. You can have a strictly increasing database sequence, and then just give out each node a bunch of numbers ("from 100 to 120"), and nodes can then generate IDs based on them. And there can be no collisions with other nodes because symmetric ciphers are obviously bijective.
For these kinds of use-cases performance is not an issue.
We use memory-hard algorithms for password storage because memory is more expensive than compute. More specifically, it's die area that is costly, but at least the authors of Argon2 seem to equate the two. (If that's not correct, I based a stackoverflow post or two on that paper so please let me know.) It sounds to me like it's easily visible to a microscope when there's another storage area as large as the L1 cache (which can hold a few thousand keys at most... how to decide which ones to keep)
Of course, the cpu is theoretically omnipotent within your hardware. It can read the RAM and see "ah, you're running pgp.exe, let me store this key", but then you could say the same for any key that your cpu handles (also rsa or anything not using special cpu instructions)
You could use FPE for multi-megabyte permutations, but I don't know why you would.
EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).
You can exploit the fact that the core of AES consists of 32-bit invertible mixing functions. In order to extend AES to 128-bit, a byte permutation is used, which mixes the bytes of the 32-bit words.
The AES instructions are such, that you can cancel the byte permutation. In this case, you can use the AES instructions to encrypt separately four 32-bit words, instead of one 128-bit block.
Similarly by canceling the standard byte permutation and replacing it with separate permutations on the 2 halves, you can make the AES instructions independently encrypt two 64-bit words.
These AES modifications remain faster than any software cipher.
How to cancel the internal permutation and replace it with external shuffle instructions was already described in the Intel white paper published in 2010, at the launch of Westmere, the first CPU with AES instructions.
Although they are omnipresent in constrained environments and lightweight protocols, small (32-bit, 64-bit) block ciphers have a bad reputation. They are perceived as something antiquated, insecure, and to stay away from for any new application, especially in software implementations.
To some extent, thinking that way is safe, and as a general building block, larger block ciphers are more versatile and provide improved security and usage limits than small block ciphers. In fact, due to modern applications and protocols, the trend in some contexts is to realize that 128 bits are not enough, and shift to large block ciphers such as Rijndael-256 and Vistrutah (NIST submission), part of NIST’s wide-block standardization effort, or public permutations such as Keccak.
The main problem with small block ciphers is that well… they are small. Generally, we want the set of possible inputs and outputs to be large enough that it’s practically impossible to enumerate. If that set is small, that doesn’t hold true. Generally, we also want the output of a cipher to be indistinguishable from random for an adversary. But with a 32-bit block cipher, collisions become likely around 216 blocks for a given key, and the distinguishing advantage only grows from there, which is ridiculously low. Using them securely is possible, but requires very careful design of application-specific protocols.
Nonetheless, small block ciphers can remain very useful, even in modern applications because in spite of their limitations, they remain block ciphers.
A block cipher is a fundamental building block in symmetric cryptography.
Imagine a list of N elements. Then, you put them all in a bag, shake the bag really well, and randomly grab all the elements one by one, to form a new list. There are still N elements, just in a completely different order. The first one may now be the 23948234th one, etc. We just applied a random-looking permutation.
If we know what the permutation is, we can easily invert the process, and map element 23948234 back to the first one.
A block cipher is a set of two functions:
P(k, x) -> x' that takes a secret key k, generates a random-looking permutation from it, and returns the image of x under that permutation.P'(k, x') -> x that applies the inverse permutation and recovers x from P(k, x).Without the secret key k, the permutation cannot be recovered, so given x', guessing what x is requires brute-forcing the key space or, for small block sizes, exhausting the (limited) set of possible inputs, as discussed below.
A common way to use block ciphers is to encrypt a counter. With a 32-bit block cipher, you can safely encrypt a counter from 0 to 232 without getting a collision, as long as the counter doesn’t wrap around and the same key isn’t reused for a different counter domain. This is still counting, just in a shuffled order.
And in real-world applications, counters are everywhere: packet/frame indices, account identifiers, message numbers, version numbers, and anything using a monotonically incrementing key in a database.
The value of some of these counters may reveal sensitive information. Creating accounts on a website and observing changes in account identifiers is a pretty reliable way to guess how many customers the company has, and if that number is increasing or decreasing. Quite sensitive data for a public company. Similarly, opening support tickets and observing the ticket number leaks information about the company’s business.
A common mitigation is to replace counters with UUID or random identifiers, even in environments providing strong consistency guarantees.
That works, with large enough identifiers.
With UUIDv4’s 122 random bits, generating ~248 identifiers gives a collision probability around 2-27, which is small but not negligible at scale.
But with a 64-bit block cipher and a counter, up to 264 identifiers can be safely generated with a collision probability of exactly 0.
If more than 232 identifiers are not necessary for your application, a 32-bit block cipher will do the same job, and keep storage requirements down to a minimum.
Keep using a regular counter, encrypt it with a block cipher before making it public, and decrypt that later if needed to recover the original counter value.
No trusted source of randomness is necessary, and no accurate clock is necessary. The encrypted values themselves won’t be ordered, but you can keep the original counter for internal indexing and only expose the encrypted form publicly.
What about UUIDs?
UUIDv1 allows adversaries to recover the exact creation time (down to 100 ns resolution), the clock sequence behavior, and a unique identifier of the machine that generated it. The timestamp alone is enough to leak traffic patterns, record creation rates, and sometimes even enable correlation across systems. It’s also predictable. Because UUIDv1 is basically “timestamp + sequence,” once an attacker sees a few UUIDs, they can often predict nearby values. That makes UUIDv1 completely unsuitable as public identifiers and in anything used in URLs or APIs where guessing matters.
UUIDv4 is 122 random bits (the remaining 6 are fixed version and variant markers).
UUIDv6 is “ordered UUIDv1,” mostly for legacy. RFC 9562 basically says: v6 exists to improve DB locality for environments already using v1, but if you’re not tied to v1, you should use v7 instead.
So what about UUIDv7? It’s explicitly time-ordered using a Unix-epoch millisecond timestamp, with the remaining bits used for randomness and/or a counter to keep monotonicity within the same millisecond.
This gives you much better insertion locality than UUIDv4 in many B-tree-style indexes. However, this requires properly synchronized clocks, and v7 leaks creation time. RFC 9562 calls out that timestamps create a (small) attack surface, and if UUIDs are used for security-sensitive purposes, it recommends UUIDv4 instead.
So, you can use UUIDv7 as the database primary key, but don’t treat it as secret.
Meanwhile, the output of a block cipher hides the underlying value, while being much smaller than a UUID, which definitely matters at scale in databases. Being deterministic, it still leaks equality (same input produces same output), distinctness, and frequency patterns, but the actual values remain hidden.
Small block ciphers work well with counters, but they work equally well with anything else that fits within the block size.
And timestamps are no exception: if they represent sensitive information, a small block cipher is a simple, efficient, usually format-preserving way to encrypt and decrypt them.
If you need UUIDs, why not combine them with a small block cipher, so that the timestamp is not leaked, while the UUIDs’ properties are retained?
UUIDv7 is essentially a 48-bit timestamp followed by randomness plus version/variant bits, and optionally a counter to ensure monotonicity within the same millisecond.
Concatenate a 48-bit timestamp with either a 16-bit counter or randomness, and 64 bits of extra randomness to form your UUIDs.
Then, encrypt the first 64 bits with a 64-bit block cipher before exposing them publicly.
The encrypted portion won’t be lexically ordered, but the original UUIDv7 can still serve as the internal primary key for indexing. Expose only the UUID with the encrypted prefix publicly, and decrypt when you need the original back.
So far, small block ciphers sound pretty useful. But they do come with real limitations that you need to understand before reaching for one.
The obvious downside of a block cipher is that, for a given key, after having observed M distinct outputs (encryption of M distinct inputs), we know that if we encrypt inputs that we didn’t see before, their encryption will not be part of the outputs we already observed.
So, for a set of N possible outputs, after having observed M distinct outputs, when a new input is encrypted, the probability to guess its encryption is 1/(N-M) per attempt.
This is where small block ciphers come short. If an adversary’s goal is to decrypt a ciphertext, and they have an oracle telling them if a guess is right, depending on the protocol, they may be able to do it.
Small block ciphers are thus generally a bad idea against active adversaries.
However, they can be very useful against passive adversaries whose capability is limited to observing identifiers, who are then unable to map them to the original value.
Usage of small block ciphers is also not incompatible with traditional mechanisms for tamper detection such as MACs.
Another downside is that, for any cipher, the secret key must remain secure. With random identifiers, only the random number generator needs to be secure.
There are plenty of small block ciphers designed for hardware, that don’t perform as well in software.
In software on general-purpose CPUs, and as scary as it may sound, the best options were designed by the NSA: the SIMON and SPECK Families of Lightweight Block Ciphers.
While their design is not radically different from well-known, well-trusted ciphers, there was originally a lot of pushback due to their origin.
And, as a side effect, they got a ton of 3rd-party analysis. Due to their simplicity, they are also the first targets of new tools and techniques for cryptanalysis. However, after more than 13 years of 3rd-party analysis, there are still zero practical attacks against them, and they still have a decent security margin. New cryptanalysis techniques have been developed, but they are generic techniques that aren’t vastly more effective on these block ciphers than other ciphers of the same size.
And if the NSA really knows of completely novel techniques to break these ciphers that no one in academia ever explored, which is very unlikely, they can apply the same techniques to other widely used ciphers sharing the same construction.
Unlike the Dual-DRBG backdoor, there’s little room to hide a backdoor in plain sight in such ciphers. SIMON’s round constants are generated by a simple LFSR with a fixed primitive polynomial. The base value doesn’t come with a justification like “digits of pi”, but in that context, anything not showing signs of symmetry is fine. SPECK avoids constants almost entirely.
TLDR: after more than 13 years, extensive cryptanalysis exists, no practical backdoors have been found, and attacks generally align with what the designers claimed. The only expectations they don’t meet are cultural. After all that time, it’s time to move on and consider them for their properties and applications.
SIMON and SPECK are small, simple, and fast. They can be implemented in ~20 lines of code in any programming language, and will have good performance on a wide range of CPUs.
They are versatile, supporting block sizes of 32, 48, 64, 96 and 128 bits, with a key size from 64 to 256 bits. For a 128-bit block cipher, you’d better use AES. But for something smaller, they are a solid choice.
And if in spite of existing analysis, you still feel nervous about structural attacks and some edge-case distinguishers, there’s a simple, cheap addition you can make: key whitening (the FX/Even-Mansour construction). XOR the key before the first round and after the last round. This has been formally shown to improve security bounds under idealized models. Done.
Want to give them a spin? Try, for example, these easy to use implementations of SPECK, SIMON and SIMECK in Zig and Rust.
Generally, no. They are not generic ciphers that are safe to use in a wide range of scenarios. If they are large enough and the random number generator can be trusted, random identifiers are also a more secure option. Maybe dates, frame numbers, account identifiers and other numbers publicly leaked are not considered sensitive data, so you don’t have to worry about them at all. It all depends on your applications and protocols.
However, small block ciphers represent something to keep in your toolbox. You don’t have to systematically reach out to AES-GCM. But even if you do, maybe the public nonce leaks information? In that case, there’s a simple way to hide it. Use a small block cipher.