The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:
| Type | Size |Endian|
|----------------|---------|-----|
| `Int8` | 1 byte | N/A |
| `UInt8` | 1 byte | N/A |
| `Int16LE/BE` | 2 bytes | Yes |
| `UInt16LE/BE` | 2 bytes | Yes |
| `Int32LE/BE` | 4 bytes | Yes |
| `UInt32LE/BE` | 4 bytes | Yes |
| `Int64LE/BE` | 8 bytes | Yes |
| `UInt64LE/BE` | 8 bytes | Yes |
| `Float16LE/BE` | 2 bytes | Yes |
| `Float32LE/BE` | 4 bytes | Yes |
| `Float64LE/BE` | 8 bytes | Yes |
| `BFloat16LE/BE`| 2 bytes | Yes |
| `Bytes(n)` | n bytes | N/A |
Some links:- https://github.com/facebook/openzl/releases/tag/v0.2.0
- https://openzl.org/getting-started/introduction/
This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.
It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.
It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.
The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.
Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.
Please run it through your preferred AI once or twice with instruction to look for bugs. The version of Fc in the main branch has at least a few memory safety bugs that attacker-controlled inputs could exploit.
I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...
Floating point data is a mess to compress, but I think the idea here is to apply different transforms (and perhaps back-end codecs) on data and see if one fits the data so perfectly that you magically get a lot of compression.
Say you have an audio with a sawtooth, it's linear an gradient but if the peaks is "random" values like 1.245 and PI then the mantissa bits of the interpolation range will look fairly "random" to a classic compressor, whilst this compressor can test to see if there are linear gradient spans (or near linear gradient) where it stores the gradient and dumps out the "difference" bits for a regular compressor.
Or 3d coordinates for 3d models (non-stripified), plenty of repeating 8-byte doubles that will be garbage and not help a classic compressor much, building a float aware dictionary and using that would easily bring down the data by quite a few %.
(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").
Copyright (c) 2026 Praveen Vaddadi thynktank@gmail.com Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE.
fc is a lossless compressor for streams of IEEE-754 64-bit doubles. It
splits the input into adaptively-sized blocks (quanta), runs a competition
between many specialized codecs on each block, and emits the smallest
result. Compression and decompression are multi-threaded with POSIX
threads and the hot paths are hand-vectorized for x86-64 with AVX2 +
SSE4.2 + BMI + LZCNT.
The current version string is fc 1.56 (see fc_ver in fc.c).
Aggregate result of the bundled test_fc harness across 17 synthetic
datasets, 1 Mi doubles each (8 MiB per dataset, β136 MiB / 142.6 MB
decimal total), median-of-5 trials per dataset, 8 encoder threads,
decoder threads auto-selected, on the maintainer's x86-64 + AVX2 box.
The "Ratio" column is total_original / total_compressed; the throughput
columns are the harmonic mean of per-dataset MB/s. Numbers come straight
out of test_fc.csv and will move on different hardware β re-run
make test to regenerate.
| Codec | Ratio | Encode (MB/s) | Decode (MB/s) |
|---|---|---|---|
| fc | 3.07 | 120 | 1277 |
| zstd -3 | 2.07 | 528 | 1556 |
| zstd -9 | 2.09 | 111 | 1572 |
| fpzip | 1.71 | 123 | 121 |
| lz4hc -9 | 1.69 | 51 | 5835 |
| gorilla | 1.66 | 683 | 971 |
| lz4 | 1.62 | 2176 | 5353 |
Headline: fc is the best size-oriented floating-point
compressor in this benchmark, not the best all-around compressor. It
wins ratio on 10 of 17 datasets outright and is never worse than
roughly third on any of them.
Where fc wins (often by a lot):
fc_dec):
aggregate β 1.28 GB/s, ~10Γ faster than encode, ~10Γ faster than
fpzip, ~1.3Γ faster than gorilla, and ~80% of zstd-3 decode. Good fit
for write-once / read-many time-series stores.Where it loses:
decimal-cents (zstd-9 β 3,465Γ vs fc 268Γ), dict-16 (zstd-9 β
10,268Γ vs fc 4,660Γ), and quantized-4lvl (zstd-9 β 9,675Γ vs fc
1,037Γ). General-purpose LZ catches structure fc doesn't currently
model.random-walk (1.39 vs 1.38), climate (1.395 vs 1.380),
geo-coords (2.012 vs 2.000), and sensor-noisy (1.397 vs 1.386).
Differences are small but consistent.fc just doesn't waste headers.Hard constraints: lossless only (no quantization). The encoder takes 8-byte-aligned input in multiples of 8 bytes; the heuristics and most modes are tuned for IEEE-754 double-precision bit patterns, so other 8-byte payloads will compress but you may leave ratio on the table. Requires x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.
Rule of thumb:
fc.fc vs
fpzip on your data; they're within ~1% of each other.This is a research-grade single-file library. The on-disk format is versioned by a magic number in the stream header; backwards-incompatible format changes will bump the magic. Treat compressed output as opaque β do not rely on internal layout.
| File | Purpose |
|---|---|
fc.h |
Public API (encode, decode, monitoring counters). |
fc.c |
Compressor implementation. ~50 codecs + dispatch + threads. |
gorilla.h |
Bundled Redis "Gorilla" XOR/delta codec β third-party. |
gorilla.c |
Bundled Redis "Gorilla" XOR/delta codec β third-party. |
test_fc.c |
Round-trip + benchmark harness over 17 synthetic datasets. |
Makefile |
Builds fc.o, gorilla.o, and the test_fc benchmark. |
LICENSE |
Apache License 2.0 (this project's own code). |
NOTICE |
Attribution and third-party license disclosures. |
CONTRIBUTING.md |
Contribution guide and DCO. |
The whole API is in fc.h:
typedef struct { int p, t, c; } fc_cfg;
size_t fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg);
size_t fc_dec(const void *src, size_t bytes, void *dst);
extern const char fc_ver[];
extern unsigned long fc_dec_mode_hist[64];
extern unsigned long fc_enc_mode_time_ns[64];
extern unsigned long fc_enc_mode_calls[64];
extern unsigned long fc_enc_mode_wins[64];
void fc_dec_mode_hist_reset(void);
const char *fc_mode_name(int mode);
fc_cfg fields| Field | Meaning |
|---|---|
p |
log2 of the predictor table size. Clamped to [10, 16] inside fc_enc. The test_fc benchmark passes 18, which is silently clamped to 16. |
t |
Worker thread count for the encoder. The decoder takes no fc_cfg and auto-selects threads via sysconf(_SC_NPROCESSORS_ONLN), capped at 128 and at the per-stream block count. |
c |
Reserved. Currently unused ((void)cfg.c; in fc_enc). |
fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg)Compresses bytes of input (a multiple of 8 β one double per 8 bytes).
Returns the number of compressed bytes written, or 0 on failure: NULL
src / dst, bytes not a multiple of 8, or an internal allocation
failure.
The destination buffer must be sized by the caller. The library does not
publish a closed-form comp_bound. The bundled benchmark allocates
2 * bytes + 64 KiB (test_fc.c), which has been sufficient for every
dataset in the test suite; for arbitrary inputs, allocate at least that
much and check the return value (the encoder will return 0 rather than
overflow).
fc_dec(const void *src, size_t bytes, void *dst)Decompresses a stream produced by fc_enc. Returns the number of
decompressed bytes written, or 0 on failure. Failure conditions:
NULL src / dst, truncated header, wrong magic, header
predsizelg2 outside [10, 16], a per-block header that claims more
input or output than remains, total decoded size that doesn't match the
header's original_bytes, or an internal allocation failure. The caller
is responsible for sizing dst from the original byte count recorded in
the stream header (use fc_dec once with a header read, or simply
allocate the original buffer length you compressed).
Note: unknown / unimplemented mode IDs are not currently treated as a hard failure β the affected block is silently left as zeros. Treat streams from untrusted sources accordingly.
The fc_*_mode_* arrays are 64-entry tables indexed by mode ID. They are
updated atomically and are intended for diagnostics only:
fc_dec_mode_hist[m] β number of blocks decoded with mode m.fc_enc_mode_calls[m] β number of times mode m was evaluated during
the encoder competition.fc_enc_mode_wins[m] β number of times mode m won and was written.fc_enc_mode_time_ns[m] β cumulative wall-clock nanoseconds spent
evaluating mode m.fc_mode_name(m) returns a short string for mode m (or "?" for
unused IDs); see the table below.
fc currently defines 50 mode IDs (0β49 used; 12, 14, and 50β63
reserved). Mode names as exposed by fc_mode_name:
0 PRED 25 PRED2
1 CONST 26 PRED_ADAPTIVE
2 STRIDE 27 VITERBI
3 XORZ 28 DELTA_BINNED
4 LZ 29 PRED_RC
5 RAW 30 PRED_INTERLEAVED
6 FLOAT32 31 BWT
7 ORDERED_DELTA 32 LZ_DICT
8 FUZZY_STRIDE 33 CONV_N
9 ALP 34 SIGN_CONV
10 TRAILING_ZERO_BP 35 CONV_DOUBLE
11 BYTE_TRANSPOSE 36 MTF_LZ
13 XOR128 37 CONV_DOUBLE_BP
15 LSB_STRIP 38 CONV_N_BINNED
16 LOOKBACK_DELTA 39 PRED_SIMD_INTERLEAVED
17 FLOAT_MULT 40 FUZZY_STRIDE_ANS
18 FCM_RLE 41 PAQ_MIXER
19 DICT 42 PAQ4_MIXER
20 DELTA2 43 BWT_MTF_TANS
21 BITPLANE 44 PRED4
22 INT_MULT 45 DELTA_DP_BINNED
23 CONV1 46 CONV_N_DP_BINNED
24 PRED_TANS 47 ELF
48 LZ_SPLIT
49 BWT_MTF_RC
Roughly grouped:
PRED, PRED2, PRED4, PRED_TANS, PRED_RC,
PRED_ADAPTIVE, PRED_INTERLEAVED, PRED_SIMD_INTERLEAVED, VITERBI,
LSB_STRIP β FCM/DFCM-style hash predictors with various residual
coders (raw, tANS, range coding) and SIMD layouts.XORZ, XOR128, LOOKBACK_DELTA, ORDERED_DELTA,
DELTA2, DELTA_BINNED, DELTA_DP_BINNED.CONST, STRIDE, FUZZY_STRIDE,
FUZZY_STRIDE_ANS, DICT, LZ_DICT, MTF_LZ.LZ, LZ_SPLIT.FLOAT32, FLOAT_MULT, INT_MULT,
ALP (Adaptive Lossless Floating-point), ELF (Erasing Last bits).BYTE_TRANSPOSE, BITPLANE, TRAILING_ZERO_BP,
SIGN_CONV, BWT, BWT_MTF_TANS, BWT_MTF_RC.CONV1, CONV_N, CONV_DOUBLE,
CONV_DOUBLE_BP, CONV_N_BINNED, CONV_N_DP_BINNED.PAQ_MIXER, PAQ4_MIXER, FCM_RLE.RAW.fc.c is the source of truth for what each mode actually does.
log2.C_QUANTUM_BYTES) but a probe
(ceq_probe_chunk_values) can grow a block up to 1 MiB
(C_QUANTUM_MAX_BYTES) when the data looks low-entropy.exp_range, sign_flips, distinct_count, looks_like_repeats,
and the running best decide which modes are worth trying) and keeps
the smallest output. Per-mode wall time, call count, and win count
are recorded in the fc_enc_mode_* counters.[1-byte mode][payload], with
length and offset bookkeeping in the stream so fc_dec can find
block boundaries without scanning the payload.fc_dec walks the block index and dispatches each
block's payload to the matching codec. Like encode, decode is
multi-threaded; fc_dec itself picks the worker count
automatically (online CPUs, capped at 128 and at the block count).make # builds fc.o, gorilla.o, and the test_fc benchmark
make test # runs ./test_fc
make clean
clang or gcc). clang is the Makefile default.pthread and libm (linked by default).The default CFLAGS enable -mavx2 -msse4.2 -mbmi -mlzcnt. The library
uses these intrinsics directly; running the binary on a CPU without them
will fault with SIGILL. There is no portable fallback.
Auto-detected by the Makefile and only used by test_fc.c for
side-by-side comparisons β they are not needed to build or use the
library itself:
libzstd (via pkg-config libzstd)liblz4 (via pkg-config liblz4)fpzip (header probe at /usr/include/fpzip.h, links -lfpzip)test_fc runs each of 17 synthetic generators (constant, linear,
parabolic, AR(2)-damped, piecewise, integer-multiples, decimal-cents,
dict-16, sin-low-freq, audio-mix, random-walk, quantized-4lvl, climate,
geo-coords, stocks, sensor-noisy, pseudo-random) at 1 Mi doubles per
dataset (overridable via FCBENCH_N), 5 trials timed at median, 8
encoder threads (compile-time THREADS), and writes test_fc.csv with
compression ratio and encode/decode throughput against the optional
baselines (zstd, lz4, fpzip, gorilla) that were detected at build time.
Round-trip correctness is checked on every dataset. A MISMATCH line in
the output is a regression.
fc_dec thread count is auto-selected and not user-configurable
(the decode API takes no fc_cfg).cfg.c is reserved and currently a no-op ((void)cfg.c;).bytes to be a multiple of 8 (one full double).cfg.p is silently clamped to [10, 16].This project's own code (fc.c, fc.h, test_fc.c, Makefile,
documentation, and build infrastructure) is licensed under the
Apache License, Version 2.0. See LICENSE.
The bundled files gorilla.c and gorilla.h are not Apache 2.0.
They are Β© Redis Ltd. and remain under their original triple license:
RSALv2, SSPLv1, or AGPLv3 (your choice). See the file headers and
NOTICE for the full text and attribution. Downstream users who
cannot accept any of those three licenses should rebuild without
gorilla.o (the core fc library does not link against it; only the
test_fc benchmark does).
See CONTRIBUTING.md. Contributions to this project's
own files are accepted under Apache 2.0; please sign off your commits
(git commit -s).