> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |

Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.

This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

Another library in this space is pcodec; I'd appreciate a comparison of the two.

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.

It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.

The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |

Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

Thanks, this looks super relevant. I think the transferable part is the per-block selectrover predictors, strides, deltas, exponent/mantissa-ish structure, byte transpose, fallback raw/LZ, etc.sddl2 looks like a natural place to try some of that.

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.

I’ve never heard of a metacompressor before, what others exist?

Another library in this space is pcodec; I'd appreciate a comparison of the two.

Agreed; pcodec is probably one of the most relevant comparisons. I will add pcodec to teh benchmark

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

I need to add it to the benchmark. My expectation is that OpenZL should be strong when the enclosing format is known and SDDL can separate typed fields cleanly. Running both on the same f64 arrays will give some information

> rather than production-hardened.

Please run it through your preferred AI once or twice with instruction to look for bugs. The version of Fc in the main branch has at least a few memory safety bugs that attacker-controlled inputs could exploit.

I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...

Does it assume the floats come from photos or sound or something?

What do you mean by decode is parallel?

I’ve never heard of a metacompressor before, what others exist?

I’m regarding that term loosely here- in this case it is 'try several representations/codecs for a block and store the winner.' Similar ideas show up in columnar formats choosing encodings per column/page, OpenZL selectors (asother commenters pointed here), and shuffle/transpose + backend-compressor pipelines. fc’s version is much narrower: a tournament among f64-specific modes per block.

I think the idea is that the compressor is "meta" in the sense that it directs compressors as GP mentions by selecting what's actually producing the best results, so it's not just one comrpessor but a series of supported ones plugged in to be used adaptively (controlled at a "meta" level).

Floating point data is a mess to compress, but I think the idea here is to apply different transforms (and perhaps back-end codecs) on data and see if one fits the data so perfectly that you magically get a lot of compression.

Say you have an audio with a sawtooth, it's linear an gradient but if the peaks is "random" values like 1.245 and PI then the mantissa bits of the interpolation range will look fairly "random" to a classic compressor, whilst this compressor can test to see if there are linear gradient spans (or near linear gradient) where it stores the gradient and dumps out the "difference" bits for a regular compressor.

Or 3d coordinates for 3d models (non-stripified), plenty of repeating 8-byte doubles that will be garbage and not help a classic compressor much, building a float aware dictionary and using that would easily bring down the data by quite a few %.

(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").

Agreed; pcodec is probably one of the most relevant comparisons. I will add pcodec to teh benchmark

> rather than production-hardened.

I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...

Thank you. Fuzz safety is definitely on my list. Current focus is to broaden the benchmarks , predictors and preprocessors and see what sticks

What do you mean by decode is parallel?

It splits the input into blocks which are encoded separately, so the decoder can fire up multiple threads to decode multiple blocks in parallel.

https://github.com/xtellect/fc#how-it-works

Does it assume the floats come from photos or sound or something?

It is intended t obe mainly source agnostic (will try to add custom source predictors too). The idea is to treat input as an ordered stream of doubles and look for numeric structure like repeats, smooth deltas, fixed increments, or low-entropy bits. Target presentlyis scientific/time-series/simulation/analytics data, not photos or sound.

(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").

fc — Floating-Point Compressor

fc is a lossless compressor for streams of IEEE-754 64-bit doubles. It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result. Compression and decompression are multi-threaded with POSIX threads and the hot paths are hand-vectorized for x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.

The current version string is fc 1.56 (see fc_ver in fc.c).

Representative numbers

Aggregate result of the bundled test_fc harness across 17 synthetic datasets, 1 Mi doubles each (8 MiB per dataset, ≈136 MiB / 142.6 MB decimal total), median-of-5 trials per dataset, 8 encoder threads, decoder threads auto-selected, on the maintainer's x86-64 + AVX2 box. The "Ratio" column is total_original / total_compressed; the throughput columns are the harmonic mean of per-dataset MB/s. Numbers come straight out of test_fc.csv and will move on different hardware — re-run make test to regenerate.

Codec	Ratio	Encode (MB/s)	Decode (MB/s)
fc	3.07	120	1277
zstd -3	2.07	528	1556
zstd -9	2.09	111	1572
fpzip	1.71	123	121
lz4hc -9	1.69	51	5835
gorilla	1.66	683	971
lz4	1.62	2176	5353

Headline: fc is the best size-oriented floating-point compressor in this benchmark, not the best all-around compressor. It wins ratio on 10 of 17 datasets outright and is never worse than roughly third on any of them.

Where fc wins (often by a lot):

Structured / analytic floats — constant ≈ 39,756× (vs. zstd-9 ≈ 11,619×), parabolic ≈ 2,973× (vs. 2.5×), int×1000 ≈ 5,388× (vs. 4.3×), piecewise ≈ 26.8× (vs. 14.9×), stocks ≈ 15.6× (vs. 7.1×).
Periodic / quasi-periodic signals — sin-low-freq, audio-mix, ar2-damped beat gorilla and fpzip by 1.3×–2.7×.
Decode is fast and parallel (auto-threaded inside fc_dec): aggregate ≈ 1.28 GB/s, ~10× faster than encode, ~10× faster than fpzip, ~1.3× faster than gorilla, and ~80% of zstd-3 decode. Good fit for write-once / read-many time-series stores.

Where it loses:

Byte-pattern-friendly quantized data — zstd wins on decimal-cents (zstd-9 ≈ 3,465× vs fc 268×), dict-16 (zstd-9 ≈ 10,268× vs fc 4,660×), and quantized-4lvl (zstd-9 ≈ 9,675× vs fc 1,037×). General-purpose LZ catches structure fc doesn't currently model.
Noisy natural-float arrays — fpzip is marginally better on random-walk (1.39 vs 1.38), climate (1.395 vs 1.380), geo-coords (2.012 vs 2.000), and sensor-noisy (1.397 vs 1.386). Differences are small but consistent.
Encode throughput — ~~120 MB/s aggregate. The mode competition is what buys the ratio; you pay for it in encoder CPU. If encode is the bottleneck, lz4 (~~2.2 GB/s) or zstd-3 (~530 MB/s) dominate.
Pseudo-random data — ratio ≈ 1.000. Same as every other lossless codec; fc just doesn't waste headers.

Hard constraints: lossless only (no quantization). The encoder takes 8-byte-aligned input in multiples of 8 bytes; the heuristics and most modes are tuned for IEEE-754 double-precision bit patterns, so other 8-byte payloads will compress but you may leave ratio on the table. Requires x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.

Rule of thumb:

Want the smallest file for a floating-point stream and decode-side latency matters → fc.
Encode-bound on heterogeneous data → zstd or lz4.
Noisy scientific arrays where every byte counts → benchmark fc vs fpzip on your data; they're within ~1% of each other.

Status

This is a research-grade single-file library. The on-disk format is versioned by a magic number in the stream header; backwards-incompatible format changes will bump the magic. Treat compressed output as opaque — do not rely on internal layout.

Repository layout

File	Purpose
`fc.h`	Public API (encode, decode, monitoring counters).
`fc.c`	Compressor implementation. ~50 codecs + dispatch + threads.
`gorilla.h`	Bundled Redis "Gorilla" XOR/delta codec — third-party.
`gorilla.c`	Bundled Redis "Gorilla" XOR/delta codec — third-party.
`test_fc.c`	Round-trip + benchmark harness over 17 synthetic datasets.
`Makefile`	Builds `fc.o`, `gorilla.o`, and the `test_fc` benchmark.
`LICENSE`	Apache License 2.0 (this project's own code).
`NOTICE`	Attribution and third-party license disclosures.
`CONTRIBUTING.md`	Contribution guide and DCO.

Public API

The whole API is in fc.h:

typedef struct { int p, t, c; } fc_cfg;

size_t fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg);
size_t fc_dec(const void *src, size_t bytes, void *dst);

extern const char  fc_ver[];
extern unsigned long fc_dec_mode_hist[64];
extern unsigned long fc_enc_mode_time_ns[64];
extern unsigned long fc_enc_mode_calls[64];
extern unsigned long fc_enc_mode_wins[64];

void        fc_dec_mode_hist_reset(void);
const char *fc_mode_name(int mode);

`fc_cfg` fields

Field	Meaning
`p`	log2 of the predictor table size. Clamped to [10, 16] inside `fc_enc`. The `test_fc` benchmark passes 18, which is silently clamped to 16.
`t`	Worker thread count for the encoder. The decoder takes no `fc_cfg` and auto-selects threads via `sysconf(_SC_NPROCESSORS_ONLN)`, capped at 128 and at the per-stream block count.
`c`	Reserved. Currently unused (`(void)cfg.c;` in `fc_enc`).

`fc_enc(const void src, size_t bytes, void dst, fc_cfg cfg)`

Compresses bytes of input (a multiple of 8 — one double per 8 bytes). Returns the number of compressed bytes written, or 0 on failure: NULL src / dst, bytes not a multiple of 8, or an internal allocation failure.

The destination buffer must be sized by the caller. The library does not publish a closed-form comp_bound. The bundled benchmark allocates 2 * bytes + 64 KiB (test_fc.c), which has been sufficient for every dataset in the test suite; for arbitrary inputs, allocate at least that much and check the return value (the encoder will return 0 rather than overflow).

`fc_dec(const void src, size_t bytes, void dst)`

Decompresses a stream produced by fc_enc. Returns the number of decompressed bytes written, or 0 on failure. Failure conditions: NULL src / dst, truncated header, wrong magic, header predsizelg2 outside [10, 16], a per-block header that claims more input or output than remains, total decoded size that doesn't match the header's original_bytes, or an internal allocation failure. The caller is responsible for sizing dst from the original byte count recorded in the stream header (use fc_dec once with a header read, or simply allocate the original buffer length you compressed).

Note: unknown / unimplemented mode IDs are not currently treated as a hard failure — the affected block is silently left as zeros. Treat streams from untrusted sources accordingly.

Monitoring counters

The fc_*_mode_* arrays are 64-entry tables indexed by mode ID. They are updated atomically and are intended for diagnostics only:

fc_dec_mode_hist[m] — number of blocks decoded with mode m.
fc_enc_mode_calls[m] — number of times mode m was evaluated during the encoder competition.
fc_enc_mode_wins[m] — number of times mode m won and was written.
fc_enc_mode_time_ns[m] — cumulative wall-clock nanoseconds spent evaluating mode m.

fc_mode_name(m) returns a short string for mode m (or "?" for unused IDs); see the table below.

Modes

fc currently defines 50 mode IDs (0–49 used; 12, 14, and 50–63 reserved). Mode names as exposed by fc_mode_name:

 0 PRED                    25 PRED2
 1 CONST                   26 PRED_ADAPTIVE
 2 STRIDE                  27 VITERBI
 3 XORZ                    28 DELTA_BINNED
 4 LZ                      29 PRED_RC
 5 RAW                     30 PRED_INTERLEAVED
 6 FLOAT32                 31 BWT
 7 ORDERED_DELTA           32 LZ_DICT
 8 FUZZY_STRIDE            33 CONV_N
 9 ALP                     34 SIGN_CONV
10 TRAILING_ZERO_BP        35 CONV_DOUBLE
11 BYTE_TRANSPOSE          36 MTF_LZ
13 XOR128                  37 CONV_DOUBLE_BP
15 LSB_STRIP               38 CONV_N_BINNED
16 LOOKBACK_DELTA          39 PRED_SIMD_INTERLEAVED
17 FLOAT_MULT              40 FUZZY_STRIDE_ANS
18 FCM_RLE                 41 PAQ_MIXER
19 DICT                    42 PAQ4_MIXER
20 DELTA2                  43 BWT_MTF_TANS
21 BITPLANE                44 PRED4
22 INT_MULT                45 DELTA_DP_BINNED
23 CONV1                   46 CONV_N_DP_BINNED
24 PRED_TANS               47 ELF
                           48 LZ_SPLIT
                           49 BWT_MTF_RC

Roughly grouped:

Predictors: PRED, PRED2, PRED4, PRED_TANS, PRED_RC, PRED_ADAPTIVE, PRED_INTERLEAVED, PRED_SIMD_INTERLEAVED, VITERBI, LSB_STRIP — FCM/DFCM-style hash predictors with various residual coders (raw, tANS, range coding) and SIMD layouts.
XOR / delta: XORZ, XOR128, LOOKBACK_DELTA, ORDERED_DELTA, DELTA2, DELTA_BINNED, DELTA_DP_BINNED.
Constant / stride / dictionary: CONST, STRIDE, FUZZY_STRIDE, FUZZY_STRIDE_ANS, DICT, LZ_DICT, MTF_LZ.
Lempel-Ziv: LZ, LZ_SPLIT.
Floating-point specific: FLOAT32, FLOAT_MULT, INT_MULT, ALP (Adaptive Lossless Floating-point), ELF (Erasing Last bits).
Transforms: BYTE_TRANSPOSE, BITPLANE, TRAILING_ZERO_BP, SIGN_CONV, BWT, BWT_MTF_TANS, BWT_MTF_RC.
Convolutional / linear models: CONV1, CONV_N, CONV_DOUBLE, CONV_DOUBLE_BP, CONV_N_BINNED, CONV_N_DP_BINNED.
Mixers: PAQ_MIXER, PAQ4_MIXER, FCM_RLE.
Fallback: RAW.

fc.c is the source of truth for what each mode actually does.

How it works

Header. The stream begins with a fixed-size header containing a magic number, the original byte count, the quantum size used, and the clamped predictor log2.
Block planning. Input is scanned and split into blocks. The default quantum is 256 KiB (C_QUANTUM_BYTES) but a probe (ceq_probe_chunk_values) can grow a block up to 1 MiB (C_QUANTUM_MAX_BYTES) when the data looks low-entropy.
Mode competition. Workers pull blocks from a queue. For each block, the encoder evaluates a feature-gated subset of the modes (exp_range, sign_flips, distinct_count, looks_like_repeats, and the running best decide which modes are worth trying) and keeps the smallest output. Per-mode wall time, call count, and win count are recorded in the fc_enc_mode_* counters.
Emission. Each block is written as [1-byte mode][payload], with length and offset bookkeeping in the stream so fc_dec can find block boundaries without scanning the payload.
Decode. fc_dec walks the block index and dispatches each block's payload to the matching codec. Like encode, decode is multi-threaded; fc_dec itself picks the worker count automatically (online CPUs, capped at 128 and at the block count).

Building

make            # builds fc.o, gorilla.o, and the test_fc benchmark
make test       # runs ./test_fc
make clean

Toolchain

A C11 compiler (clang or gcc). clang is the Makefile default.
POSIX threads.
pthread and libm (linked by default).

Required CPU features

The default CFLAGS enable -mavx2 -msse4.2 -mbmi -mlzcnt. The library uses these intrinsics directly; running the binary on a CPU without them will fault with SIGILL. There is no portable fallback.

Optional benchmark dependencies

Auto-detected by the Makefile and only used by test_fc.c for side-by-side comparisons — they are not needed to build or use the library itself:

libzstd (via pkg-config libzstd)
liblz4 (via pkg-config liblz4)
fpzip (header probe at /usr/include/fpzip.h, links -lfpzip)

Benchmark harness

test_fc runs each of 17 synthetic generators (constant, linear, parabolic, AR(2)-damped, piecewise, integer-multiples, decimal-cents, dict-16, sin-low-freq, audio-mix, random-walk, quantized-4lvl, climate, geo-coords, stocks, sensor-noisy, pseudo-random) at 1 Mi doubles per dataset (overridable via FCBENCH_N), 5 trials timed at median, 8 encoder threads (compile-time THREADS), and writes test_fc.csv with compression ratio and encode/decode throughput against the optional baselines (zstd, lz4, fpzip, gorilla) that were detected at build time.

Round-trip correctness is checked on every dataset. A MISMATCH line in the output is a regression.

Limitations

x86-64 only; no ARM/NEON path. AVX2 + SSE4.2 + BMI + LZCNT required at run time.
fc_dec thread count is auto-selected and not user-configurable (the decode API takes no fc_cfg).
cfg.c is reserved and currently a no-op ((void)cfg.c;).
The encoder requires bytes to be a multiple of 8 (one full double).
The predictor parameter cfg.p is silently clamped to [10, 16].
Unknown mode IDs in a compressed stream are not flagged as errors; affected blocks decode to zeros.
The on-disk format is not stable across major versions.

License

This project's own code (fc.c, fc.h, test_fc.c, Makefile, documentation, and build infrastructure) is licensed under the Apache License, Version 2.0. See LICENSE.

The bundled files gorilla.c and gorilla.h are not Apache 2.0. They are © Redis Ltd. and remain under their original triple license: RSALv2, SSPLv1, or AGPLv3 (your choice). See the file headers and NOTICE for the full text and attribution. Downstream users who cannot accept any of those three licenses should rebuild without gorilla.o (the core fc library does not link against it; only the test_fc benchmark does).

Contributing

See CONTRIBUTING.md. Contributions to this project's own files are accepted under Apache 2.0; please sign off your commits (git commit -s).

Contact

Maintainer: Praveen Vaddadi — thynktank@gmail.com
Security reports: same address; please do not file public issues for vulnerabilities.

Thank you. Fuzz safety is definitely on my list. Current focus is to broaden the benchmarks , predictors and preprocessors and see what sticks

It splits the input into blocks which are encoded separately, so the decoder can fire up multiple threads to decode multiple blocks in parallel.

https://github.com/xtellect/fc#how-it-works

isn't sound a time series? I guess it's not usually 64-bit doubles.

Hacker Times

Hacker Times

Fc, a lossless compressor for floating-point streams

Discussion

Discussion

fc — Floating-Point Compressor

Representative numbers

Status

Repository layout

Public API

`fc_cfg` fields

`fc_enc(const void src, size_t bytes, void dst, fc_cfg cfg)`

`fc_dec(const void src, size_t bytes, void dst)`

Monitoring counters

Modes

How it works

Building

Toolchain

Required CPU features

Optional benchmark dependencies

Benchmark harness

Limitations

License

Contributing

Contact

Hacker Times

Hacker Times

Fc, a lossless compressor for floating-point streams

Discussion

Discussion

fc — Floating-Point Compressor

Representative numbers

Status

Repository layout

Public API

fc_cfg fields

fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg)

fc_dec(const void *src, size_t bytes, void *dst)

Monitoring counters

Modes

How it works

Building

Toolchain

Required CPU features

Optional benchmark dependencies

Benchmark harness

Limitations

License

Contributing

Contact

`fc_cfg` fields

`fc_enc(const void src, size_t bytes, void dst, fc_cfg cfg)`

`fc_dec(const void src, size_t bytes, void dst)`