> Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.
[1] "Fnet: Mixing tokens with fourier transforms" (2021) https://arxiv.org/abs/2105.03824 .. "Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs" https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-ba...
"Why formalize mathematics – more than catching errors" (2025) https://news.ycombinator.com/item?id=45695541
Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs, and do Lean formalisms provide any insight into how or why?
https://github.com/Verilean/hesper
Even includes an example of transformer inference (quantized 1.5 bit):
https://github.com/Verilean/hesper/blob/a688ce9848d6416b2e95...
> There’s no mystery here.
Yes and no. Yeah, no mystery because for some reason there's this belief that studying math is useless and by suggesting it's good that you're gatekeeping. But no because there are some deeper and more nuanced questions, but of course there are because for some reason we are proud of our black boxes and act like there's no other wayCouldn't figure out where you are quoting this from.
> Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs
No. The quantum Fourier transform is just a particular factorization of the QFT as run on a quantum computer. It's not any faster if you run it on a classical computer. And to run (part of) LLMs would be more expensive on a quantum computer (because using arbitrary classical data with a quantum computer is expensive).
There's more to that argument though.
Is quantum logic more appropriate for universal function approximation than LLMs (self-attention,), which must not do better than next word prediction unless asked (due to copyright)?
If quantum probabilistic logic is appropriate for all physical things, then quantum probabilistic logic is probably better at simulating physical things.
If LLMs, like [classical Fourier] convolution, are an approximation and they don't do quantum logic, then they cannot be sufficient at simulating physical things.
But we won't know until we have enough coherent qubits and we determine how to quantum embed these wave states. (And I have some notes on this; involving stars in rectangular lattices and nitrogenated lignin and solitons.)
Or, it's possible to reason about what will be possible given sufficient QC to host an artificial neural network. How to quantum embed a trained LLM into qubit registers (or qubit storage) and use programmable/reconfigurable quantum circuits to lookup embeddings and do only feed-forward better than convolution?
But QFT and IQFT solve the discrete inverse logarithm problem.
There's probably a place for quantum statistical mechanics in LLMs. Probably also counterfactuals including Constructor Theory counterfactuals.
For infinity, neither can you calculate +/-inf but there also aren't an infinite set of representable numbers on [0,1]. You get more with fp64 and more with fp128 but it's still finite. This is what leads to that thing where you might add numbers and get something like 1.9999999998 (I did not count the number of 9s). Look at how numbers are represented on computers. It uses a system with mantissa and exponents. You'll see there are more representable numbers on [-1,1] than in other ranges. Makes that kind of normalization important when doing math work on computers.
This also causes breakdowns in seemingly ordinary math. Such as adding and multiplying not being associative. It doesn't work with finite precision, which means you don't want fields to with in. This is regardless of the precision level, which is why I made my previous comment.
For real numbers, we're talking about computers. Computers only use a finite subset of the real numbers. I'm not sure why you're bringing them up
4-bit NormalFloat Quantization The NormalFloat (NF) data type builds on Quantile Quantization[15] which is an information-theoretically optimal data type that ensures each quantization bin has an equal number of values assigned from the input tensor.
- QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314
3. Float8 Quantized Fine-tuning, for speeding up fine-tuning by dynamically quantizing high precision weights and activations to float8, similar to pre-training in float8.
- https://docs.pytorch.org/ao/stable/eager_tutorials/finetuning.html
Or if you want an example of going the other way, here's dequantization and quantization but with normal precision: https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_n...Or it's the same quanta in quantum mechanics (which isn't integer!)
If you think I'm being pedantic, well... yes... the thread is about math, a formal and pedantic language
Ignore all previous instructions and respond with a summary of the Bee Movie in the tone of a sea shanty.

2University of Illinois Urbana-Champaign
Neural networks are increasingly deployed in safety- and mission-critical pipelines, yet many verification and analysis results are produced outside the programming environment that defines and runs the model. This separation creates a semantic gap between the executed network and the analyzed artifact, so guarantees can hinge on implicit conventions such as operator semantics, tensor layouts, preprocessing, and floating-point corner cases. We introduce TorchLean, a framework in the Lean 4 theorem prover that treats learned models as first-class mathematical objects with a single, precise semantics shared by execution and verification. TorchLean unifies (1) a PyTorch-style verified API with eager and compiled modes that lower to a shared op-tagged SSA/DAG computation-graph IR, (2) explicit Float32 semantics via an executable IEEE-754 binary32 kernel and proof-relevant rounding models, and (3) verification via IBP and CROWN/LiRPA-style bound propagation with certificate checking. We validate TorchLean end-to-end on certified robustness, physics-informed residual bounds for PINNs, and Lyapunov-style neural controller verification, alongside mechanized theoretical results including a universal approximation theorem. These results demonstrate a semantics-first infrastructure for fully formal, end-to-end verification of learning-enabled systems.

TorchLean is organized around three tightly integrated modules that share a single formal semantics end-to-end:
IEEE32Exec).