CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering

One of the authors (of one of the two models, not this particular paper) here. Just a clarification, these models are *not* burned into silicon. They are trained with brutal QAT but are put onto fpgas. For axol1tl, the weights are burned in the sense that the weights are hard-wired in the fabric (i.e., shift-add instead of conventional read-muk-add cycle), but not on the raw silicon so the chip can be reprogrammed. Though, for projects like smartpixel or HG-Cal readout, there are similar ones targeting silicon (google something like "smartpixel cern", "HGCAL autoencoder" and you will find them), and I thought it was one of them when viewing the title.

Some slides with more info: https://indico.cern.ch/event/1496673/contributions/6637931/a... The approval process for a full paper is quite lengthy in the collaboration, but a more comprehensive one is coming in the following months, if everything went smoothly.

Regarding the exact algorithm: there are a few versions of the models deployed. Before v4 (when this article was written), they are slides 9-10. The model was trained as a plain VAE that is essentially a small MLP. In inference time, the decoder was stripped and the mu^2 term from the KL div was used as the loss (contributions from terms containing sigma was found to be having negliable impact on signal efficiency). In v5 we added a VICREG block before that and used the reconstruction loss instead. Everything runs in =2 clock cycles at 40MHz clock. Since v5, hls4ml-da4ml flow (https://arxiv.org/abs/2512.01463, https://arxiv.org/abs/2507.04535) was used for putting the model on FPGAs.

For CICADA, the models was trained as a VAE again, but this time distilled with supervised loss on the anomaly score on a calibration dataset. Some slides: https://indico.global/event/8004/contributions/72149/attachm... (not up-to-date, but don't know if there other newer open ones). Both student and teacher was a conventional conv-dense models, can be found in slides 14-15.

Just sell some of my works for running qat (high-granularity quantization) and doing deployment (distributed arithmetic) of NNs in the context of such applications (i.e., FPGA deployment for <1us latency), if you are interested: https://arxiv.org/abs/2405.00645 https://arxiv.org/abs/2507.04535

Happy to take any questions.

They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.

https://arxiv.org/html/2411.19506v1

Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better

I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.

Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)

https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)

Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...

How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN

A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning

Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.

Not on the same extreme level, but I know that some coffee machines use a tiny CNN based model locally/embedded. There is a small super cheap camera integrated in the coffee machine, and the model does three things: (1) classifies the container type in order to select type of coffee, (2) image segmentation - to determine where the cup/hole is placed, (3) regression - to determine the volume and regulate how much coffee to pour.

First internship, cern, summer 1989 on the opal lepc pit, wrote offline data filtering program in FORTRAN. Blast from the past.

Thanks for the thoughtful comments and links really appreciated the high-signal feedback. We've updated the article to better reflect the actual VAE-based AXOL1TL architecture (variational autoencoder for anomaly detection). Added the arXiv paper and Thea Aarrestad's talks to the Primary Sources.

This is the spirit. I'm doing something similar: scaling a 1.8T logic system using a budget mobile device as the primary node. Just hit 537 clones today. It's all about how you structure the logic, not the CPU power.

Intuitively, I’ve always had an impression that using an analogue circuit would be feasible for neural networks (they just matrix multiplication!). These should provide instantaneous output.

Isn’t this kind of approach feasible for something so purpose-built?

Do they actually have ASICs or just FPGAs? The article seems a bit unclear.

Hey Siri, show me an example of an oxymoron!

> CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).

Does string theory finally make sense when we ad AI hallucinations?

CERN has been doing HEP experiments for decades. What did it use before the current incarnation of AI? The AI label seems to be more marketing and superficial than substantial. It’s a bit sad that a place like CERN feels the need to make it public that it is on the bandwagon.

I wonder if it is a PhD thesis to prove that the data prefiltering doesn’t bias the results.

I think chips having a single LLM directly on them will be very common once LLMs have matured/reached a ceiling.

cern has been using neural networks for decades

That's what Groq did as well: burning the Transformer right onto a chip (I have to say I was impressed by the simplicity, but afterwards less so by their controversial Kushner/Saudi investment) .

the fact that 99% of LHC data is just gone forever is insane

Why did we stop calling this stuff machine learning again? this isn't even an llm, which has become the common bar for 'ai'

When is the price of fabbing silicon coming down, so every SMB can do it?

Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.

I hope they have good results and keep all the data they need, and identify all the interesting data they're looking for. I do have a cautionary tale about mini neural networks in new experiments. We recently spent a large amount of time training a mini neural network (200k parameters) to make new predictions in a very difficult domain (predicting specific trails for further round collisions in a hash function than anyone did before.) We put up a spiffy internal dashboard[1] where we could tune parameters and see how well the neural network learns the existing results. We got to r^2 of 0.85 (that is very good correlation) on the data that already existed, from other people's records and from the data we solved for previously. It showed such a nicely dropping loss function as it trained, brings tears to the eye, we were pumped to see how it performs on data it didn't see before, data that was too far out to solve for. So many parameters to tune! We thought we could beat the world record by 1 round with it (40 instead of 39 rounds), and then let the community play with it to see if they can train it even better, to predict the inputs that let us brute force 42 round collisions, or even more. We could put up a leaderboard. The possiblities were endless, all it had to do was do extrapolate some input values by one round. We'd take the rest from there with the rest of our solving instrastructure.

After training it fully, we moved on to the inference stage, trying it on the round counts we didn't have data for! It turned out ... to have zero predictive ability on data it didn't see before. This is on well-structured, sensible extrapolations for what worked at lower round counts, and what could be selected based on real algabraic correlations. This mini neural network isn't part of our pipeline now.

[1] screenshot: https://taonexus.com/publicfiles/mar2026/neural-network.png

Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)

https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)

Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...

Do they actually have ASICs or just FPGAs? The article seems a bit unclear.

I think chips having a single LLM directly on them will be very common once LLMs have matured/reached a ceiling.

cern has been using neural networks for decades

Why did we stop calling this stuff machine learning again? this isn't even an llm, which has become the common bar for 'ai'

[1] screenshot: https://taonexus.com/publicfiles/mar2026/neural-network.png

They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.

https://arxiv.org/html/2411.19506v1

Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better

I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

It seems like most of the implementation is FPGA, which I wouldn’t call “physically burned into silicon.” That’s quite a stretch of language

Because if it’s not an LLM it’s not good for the current hype cycle. Calling everything AI makes the line go up.

Thanks for tracking this down. I too am annoyed when so-called technical articles omit the actual techniques.

Because it does not align with LLM Uber Alles.

I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.

At this point AI basically means "we didn't know how to solve the problem so we just threw a black box at it".

How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN

CERN in fact does design custom ASICs for other things: https://indico.cern.ch/event/1115079/contributions/4693643/a...

(Probably not for this here though.)

Could they.... have someone else do it for them?

A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning

ML is part of AI, and has always been. AI is not equal to chatgpt and AI wasn't coined/conceived in November 2022.

AI is not a new thing, and machine learned logic definitely counts as AI.

Is a LLM logic in weights derived from machine learning?

Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.

I guess shows the LLM-companies' marketing worked very well because that's what I immediately thought of.

> FPGA

So they aren't "burned into silicon" then? The article mentions FPGAs and ASICs but it's a bit vague. I would be surprised if ASICs actually made sense here.

Intuitively, I’ve always had an impression that using an analogue circuit would be feasible for neural networks (they just matrix multiplication!). These should provide instantaneous output.

Isn’t this kind of approach feasible for something so purpose-built?

You might wanna look at https://taalas.com/

Hey Siri, show me an example of an oxymoron!

> CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).

There's no mention of SLMs or LLMs, though.

> This work represents a compelling real-world demonstration of “tiny AI” — highly specialised, minimal-footprint neural networks

FPGAs for Neural Networks have been s thing since before the LLM era.

Are they some ancient small-scale integration VLSI design? Do they broadcast on a low-frequency VHF band? Face it: Oxymorons like those are part of the technical world. "VLSI" was a current term back when whole CPUs were made out of fewer transistors than we use for register files now, and "VHF" is low frequency even by commercial broadcasting standards.

Does string theory finally make sense when we ad AI hallucinations?

It doesn't say LLM anywhere.

That's what Groq did as well: burning the Transformer right onto a chip (I have to say I was impressed by the simplicity, but afterwards less so by their controversial Kushner/Saudi investment) .

> That's what Groq did as well: burning the Transformer right onto a chip

Are you perhaps confusing Groq with the Etched approach? IIUC Etched is the company that "burned the transformer onto a chip". Groq uses LPUs that are more generalist (they can run many transformers and some other architectures) and their speed comes from using SRAM.

the fact that 99% of LHC data is just gone forever is insane

Not really. Think of the experiment as a very, very high speed camera. They can't store every frame, so they try to capture just the "interesting" ones. They also store some random ones that can be used later as controls or in case they realize they've missed something. That's the whole job of these various layers of algorithms: recognizing interesting frames. Sometimes a new experiment basically just changes the definition of "interesting"

When is the price of fabbing silicon coming down, so every SMB can do it?

My guess would be never. The closest you can get is "multi project wafers" where you get bundled with a load of other projects. As I understand it they're on the order of $100k which is cheap, but if you actually want to design and verify a chip you're looking at at least several million in salaries and software costs. Probably more like $10m, especially if you're paying US salaries. And of course that would be for a low performance design.

I think a better question would be "when are FPGAs going to stop being so ridiculously overpriced". That feels more possible to me (but still unlikely).

It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1

5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.

This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states

> The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.

… they’re not? Who said they are? The article even explicitly says they’re not?

It seems like most of the implementation is FPGA, which I wouldn’t call “physically burned into silicon.” That’s quite a stretch of language

Thanks for tracking this down. I too am annoyed when so-called technical articles omit the actual techniques.

Because it does not align with LLM Uber Alles.

CERN in fact does design custom ASICs for other things: https://indico.cern.ch/event/1115079/contributions/4693643/a...

(Probably not for this here though.)

ML is part of AI, and has always been. AI is not equal to chatgpt and AI wasn't coined/conceived in November 2022.

You might wanna look at https://taalas.com/

There's no mention of SLMs or LLMs, though.

> This work represents a compelling real-world demonstration of “tiny AI” — highly specialised, minimal-footprint neural networks

FPGAs for Neural Networks have been s thing since before the LLM era.

Huh? The first paragraph literally says they are using LLMs

> [ GENEVA, SWITZERLAND — March 28, 2026 ] — CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).

haha, yea they are part of it for sure, and I'm not dunking on the use of them, but I rather smile a bit when I stumble upon them.

Like (~9K) Jumbo Frames!

It doesn't say LLM anywhere.

Good catch. Corrected. Thanks!

> That's what Groq did as well: burning the Transformer right onto a chip

I think a better question would be "when are FPGAs going to stop being so ridiculously overpriced". That feels more possible to me (but still unlikely).

Doesn't this vary wildly depending on the process node though? The cutting edge stuff keeps getting increasingly ridiculous meanwhile I thought you could get something like 50 nm for cheap. I also remember seeing years ago that some university had a ~micron (IIRC) process that you could order from.

It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1

5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.

We’ve been calling neural nets AI for decades.

> 5 years before that, a Big Data algorithm.

The DNN part? Absolutely not.

I don’t know why people feel the need for such revisionism but AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

i hate that we're in this linguistic soup when it comes to algorithmic intelligence now.

This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states

> The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.

… they’re not? Who said they are? The article even explicitly says they’re not?

For 40 minutes, the article claimed they used LLMs. They changed the wording twice: https://theopenreader.org/index.php?title=Journalism:CERN_Us... and https://theopenreader.org/index.php?title=Journalism%3ACERN_...

I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine learning to recognize Higgs bozon.

[1] https://archive.ics.uci.edu/ml/datasets/HIGGS

In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.

> I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

Already the case with consulting companies, have seen it myself

I'm half expecting to see "AI model" appearing as stand-in for "if > 0" at this point in the cycle.

I'm sure I've seen basic hill climbing (and other optimisation algorithms) described as AI, and then used evidence of AI solving real-world science/engineering problems.

And why not, when linear regression works, it works so well it's basically magic, better than intelligence, artificial or otherwise

Having work with people who do that, I can guarantee that’s not the case. See https://ssummers.web.cern.ch/conifer/ and HSL4ML, these run BDT and CNN

That works well to get around patents btw :)

Because if it’s not an LLM it’s not good for the current hype cycle. Calling everything AI makes the line go up.

LLMs also make the cynicism go up among the HN crowd.

At this point AI basically means "we didn't know how to solve the problem so we just threw a black box at it".

I disagree. More often than not is "We know how to solve the problem, and the solution is some linear algebra"

Could they.... have someone else do it for them?

Glib, but it wont be cost effective at that small scale

And why not, when linear regression works, it works so well it's basically magic, better than intelligence, artificial or otherwise

That works well to get around patents btw :)

Having work with people who do that, I can guarantee that’s not the case. See https://ssummers.web.cern.ch/conifer/ and HSL4ML, these run BDT and CNN

I disagree. More often than not is "We know how to solve the problem, and the solution is some linear algebra"

Glib, but it wont be cost effective at that small scale

I guess shows the LLM-companies' marketing worked very well because that's what I immediately thought of.

haha, yea they are part of it for sure, and I'm not dunking on the use of them, but I rather smile a bit when I stumble upon them.

Like (~9K) Jumbo Frames!

Huh? The first paragraph literally says they are using LLMs

the site might have fixed it, to me it says "artificial intelligence" instead of LLM, still bad but not" steaming pile of poo on you bank statement" bad

Good catch. Corrected. Thanks!

We’ve been calling neural nets AI for decades.

> 5 years before that, a Big Data algorithm.

The DNN part? Absolutely not.

I don’t know why people feel the need for such revisionism but AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

> AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

When I was 13, having just started programming, I picked up a book from a "junk bin" at a book store on Artificial Intelligence. It must have been from the mid-80s if not older.

It had an entire chapter on syllogism[1] and how to implement a program to spit them out based on user input. As I recall it basically amounted to some string exteaction assuming user followed a template and string concatenation to generate the result. I distinctly recall not being impressed about such a trivial thing being part of a book on AI.

[1]: https://en.wikipedia.org/wiki/Syllogism

i hate that we're in this linguistic soup when it comes to algorithmic intelligence now.

I'm half expecting to see "AI model" appearing as stand-in for "if > 0" at this point in the cycle.

This is why I am programming now in Ocaml, files themselves are AI ( ml ).

This is essentially what any relu based neural network approximately looks like (smoother variants have replaced the original ramp function). AI, even LLMs, essentially reduce to a bunch of code like

    let v0 = 0
    let v1 = 0.40978399*(0.616*u + 0.291*v)
    let v2 = if 0 > v1 then 0 else v1

    let v3 = 0
    let v4 = 0.377928*(0.261*u + 0.468*v)
    let v5 = if 0 > v4 then 0 else v4...

There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine learning to recognize Higgs bozon.

[1] https://archive.ics.uci.edu/ml/datasets/HIGGS

In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.

The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:

https://opendata-qa.cern.ch/record/93940

if you can beat it with linear regression we'd be happy to know.

> I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

Already the case with consulting companies, have seen it myself

Some career do-nothing-but-make-noise in my organization hired a firm to 'Do AI' on some shitty data and the outcome was basically linear regression. It turns out that you can impressive executives with linear regression if you deliver it enthusiastically enough.

I'm sure I've seen basic hill climbing (and other optimisation algorithms) described as AI, and then used evidence of AI solving real-world science/engineering problems.

Historically this was very much in the field of AI, which is such a massive field that saying something uses AI is about as useful as saying it uses mathematics. Since the term was first coined it's been constantly misused to refer to much more specific things.

From around when the term was first coined: "artificial intelligence research is concerned with constructing machines (usually programs for general-purpose computers) which exhibit behavior such that, if it were observed in human activity, we would deign to label the behavior 'intelligent.'" [1]

[1]: https://doi.org/10.1109/TIT.1963.1057864

I am somewhat cynically waiting for the AI community to rediscover the last half a century of linear algebra and optimisation techniques.

At some point someone will realise that backpropagation and adjoint solves are the same thing.

LLMs also make the cynicism go up among the HN crowd.

Hm. Is HN starting to become more skeptical of LLMs? For the past couple of years, HN has seemed worryingly enthusiastic about LLMs.

AI is not a new thing, and machine learned logic definitely counts as AI.

For those that have experience with ML, yes. For those that have recently become acquainted with it (more on business side) they seem to really struggle with this in my experience. '

Yeah, and don’t forget Eliza!

Is a LLM logic in weights derived from machine learning?

Well, yes. That's literally what it is.

Good one… but Is a DB query filter AI? I forgot to say though is sounds like a really cool thing to do

> FPGA

So they aren't "burned into silicon" then? The article mentions FPGAs and ASICs but it's a bit vague. I would be surprised if ASICs actually made sense here.

They make sense when you consider that 'on detector' electronics has all sorts of constraints that FPGAs cant compete on: Power, Density, Radiation hardness, Material budget.

First internship, cern, summer 1989 on the opal lepc pit, wrote offline data filtering program in FORTRAN. Blast from the past.

The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:

https://opendata-qa.cern.ch/record/93940

if you can beat it with linear regression we'd be happy to know.

Hm. Is HN starting to become more skeptical of LLMs? For the past couple of years, HN has seemed worryingly enthusiastic about LLMs.

I wonder if it is a PhD thesis to prove that the data prefiltering doesn’t bias the results.

For those that have experience with ML, yes. For those that have recently become acquainted with it (more on business side) they seem to really struggle with this in my experience. '

Yeah, and don’t forget Eliza!

Well, yes. That's literally what it is.

Good one… but Is a DB query filter AI? I forgot to say though is sounds like a really cool thing to do

They make sense when you consider that 'on detector' electronics has all sorts of constraints that FPGAs cant compete on: Power, Density, Radiation hardness, Material budget.

the site might have fixed it, to me it says "artificial intelligence" instead of LLM, still bad but not" steaming pile of poo on you bank statement" bad

> AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

When I was 13, having just started programming, I picked up a book from a "junk bin" at a book store on Artificial Intelligence. It must have been from the mid-80s if not older.

[1]: https://en.wikipedia.org/wiki/Syllogism

Happy to take any questions.

This is why I am programming now in Ocaml, files themselves are AI ( ml ).

I am sure you did not forget that pattern matching.

This is essentially what any relu based neural network approximately looks like (smoother variants have replaced the original ramp function). AI, even LLMs, essentially reduce to a bunch of code like

    let v0 = 0
    let v1 = 0.40978399*(0.616*u + 0.291*v)
    let v2 = if 0 > v1 then 0 else v1

    let v3 = 0
    let v4 = 0.377928*(0.261*u + 0.468*v)
    let v5 = if 0 > v4 then 0 else v4...

Thats a bit far. Relu does check x>0 but thats just one non-linearity in the linear/non-linear sandwich that makes up universal function approximator theorem. Its more conplex than just x>0

[1]: https://doi.org/10.1109/TIT.1963.1057864

That definition moves the goalposts almost by definition, people only stopped thinking that chess demonstrated intelligence when computers started doing it.

Tbh, often enough, linear regression is exactly what is needed.

I am somewhat cynically waiting for the AI community to rediscover the last half a century of linear algebra and optimisation techniques.

At some point someone will realise that backpropagation and adjoint solves are the same thing.

There are plenty of smart people in the "AI community" already who know it. Smugly commenting does not replace actual work. If you have real insight and can make something perform better, I guarantee you that many people will listen (I don't mean twitter influencers but the actual field). If you don't know any serious researcher in AI, I have my doubts that you have any insight to offer.

I am sure they are aware...

Very cool to see you work! Early in my PhD I did some work with GNN accelerators on FPGAs (which I think later ended up in some form as a colab with some CERN or Fermilab folks) and have chatted a bit in the past with the FastML, HLS4ML, and HEP folks.

I have since pivoted a lot of my PhD work (still related the HLS and EDA). But I wonder what is the current main limitation/challenges of building these trigger systems in hardware today. For example, in my mind it seems like the EDA and tooling can be a big limitation such as reliance on commercial HLS tools which can be buggy, hard to use, and hard to debug. From experience, this makes it harder to build different optimized architectures in hardware or build co-design frameworks without having high HLS expertise or putting in a lot of extra engineering/tooling effort. Also tool runtimes make the design and debug cycle longer, especially if you are trying to DSE on post-implementation metrics since you bring in implementation tools as well.

But I might be way off here and the real challenges are with other aspects beyond the tools.

Strictly speaking, expert systems are AI as well, as in, an expert comes up with a bunch of if/else rules. So yes technically speaking even if they didn’t acquire the weights using ML and hand-coded them, it could still be called AI.

What what is? The article has nothing to do with LLMs. It even explicitly says they don’t use LLMs.

Eliza was 1960s.

In the 1990s I remember taking my friend's IRC chat history and running it through a Markov model to generate drivel, which was really entertaining.

Eliza was 1960s.

In the 1990s I remember taking my friend's IRC chat history and running it through a Markov model to generate drivel, which was really entertaining.

It is 100% valid to label an algorithm that plays tic-tac-toe as "AI"

Much of the early AI research was spent on developing various algorithms that could play board games.

Didn't even need computers, one early AI was MENACE [1], a set of 304 matchboxes which could learn how to play noughts and crosses.

[1] https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_...

What what is? The article has nothing to do with LLMs. It even explicitly says they don’t use LLMs.

> Is a LLM logic in weights derived from machine learning?

I was just answering this question. LLM logic in weights is fundamentally from machine learning, so yes. Wasn't really saying anything about the article.

I am sure you did not forget that pattern matching.

Tbh, often enough, linear regression is exactly what is needed.

I am sure they are aware...

> Is a LLM logic in weights derived from machine learning?

I was just answering this question. LLM logic in weights is fundamentally from machine learning, so yes. Wasn't really saying anything about the article.

[ GENEVA, SWITZERLAND — March 28, 2026 ] — CERN is using extremely small, custom artificial intelligence models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).

LHC tunnel and detectors

OVERVIEW

Proton collision in LHC detector

The Large Hadron Collider (LHC) generates an extraordinary volume of raw data — approximately 40,000 exabytes per year, equivalent to roughly one quarter of the entire current internet. During peak operation, the data stream can reach hundreds of terabytes per second, far exceeding the capacity of any feasible storage or conventional computing system.

Because it is physically impossible to store or process the full dataset, CERN must make split-second decisions at the detector level: which collision events contain potentially groundbreaking scientific value, and which should be discarded forever. This real-time selection process is one of the most demanding computational challenges in modern science.

To meet these extreme requirements, CERN has deliberately moved away from conventional GPU or TPU-based artificial intelligence architectures. Instead, the laboratory develops highly optimized, ultra-compact AI models that are compiled and physically implemented directly into custom silicon — primarily field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). These hardware-embedded models enable ultra-low-latency inference at the very edge of the detector system, where decisions must be made in microseconds or even nanoseconds.

THE DATA CHALLENGE

AXOL1TL algorithm floorplan on FPGA/ASIC

Inside the 27-kilometre ring of the Large Hadron Collider, proton bunches travel at velocities approaching the speed of light and cross paths roughly every 25 nanoseconds. Although billions of protons pass through one another during each crossing, actual hard collisions between protons remain relatively rare events.

When a collision does occur, the detectors surrounding the interaction point capture several megabytes of raw data from the resulting particle shower. This creates an overwhelming data stream: the LHC can generate up to hundreds of terabytes per second at peak luminosity. Storing or processing the full volume is physically impossible with current technology.

As a result, only about 0.02 % of all collision events are ultimately retained for further analysis. The first and most critical filtering stage, known as the Level-1 Trigger, is responsible for making these split-second decisions. It consists of approximately 1,000 field-programmable gate arrays (FPGAs) that evaluate incoming data in less than 50 nanoseconds. A highly specialized algorithm called AXOL1TL runs directly on these chips, analysing detector signals in real time and determining which events are scientifically promising enough to be preserved. All other data is discarded immediately and permanently.

AI APPROACH AND TECHNICAL STACK

High Level Trigger computing farm at CERN

CERN’s artificial intelligence models are deliberately designed to be extremely small and highly optimised for the unique constraints of the LHC environment. Unlike the large-scale language models and general-purpose AI systems commonly used in industry, these models are tailored specifically for ultra-low-latency, real-time inference at the detector level, where decisions must be made in nanoseconds.

The models are compiled using the open-source tool **HLS4ML**, which translates machine-learning models written in frameworks such as PyTorch or TensorFlow into synthesizable C++ code. This code can then be deployed directly onto field-programmable gate arrays (FPGAs), systems-on-chip (SoCs), or custom application-specific integrated circuits (ASICs). The resulting hardware implementations achieve the extreme speed required while consuming significantly less power and silicon area than conventional GPU- or TPU-based solutions.

A distinctive feature of CERN’s approach is that a substantial portion of the available chip resources is not allocated to the neural network layers themselves. Instead, these resources are used to implement extensive precomputed lookup tables. These tables store the results of common input patterns in advance, allowing the hardware to deliver near-instantaneous outputs for the vast majority of typical detector signals without performing full floating-point calculations. This hardware-first design philosophy is what enables the system to operate at the required nanosecond-scale latency.

The second filtering stage, known as the High-Level Trigger, runs on a large surface-level computing farm consisting of 25,600 CPUs and 400 GPUs. Even after the aggressive Level-1 Trigger has reduced the data volume, this farm must still process terabytes of data per second before further reducing it to approximately one petabyte of scientifically valuable data per day.

FUTURE PLANS

The current Large Hadron Collider is scheduled for a major upgrade known as the High-Luminosity LHC (HL-LHC), which is expected to begin operations in 2031. This upgrade will dramatically increase the collider’s luminosity, producing roughly ten times more data per collision and generating significantly larger event sizes than the present LHC.

CERN is already actively preparing its AI hardware pipeline to handle this anticipated surge in data volume. The laboratory is developing next-generation versions of its ultra-compact AI models, further optimizing FPGA and ASIC implementations, and enhancing the entire real-time triggering system to maintain the extreme low-latency performance required for effective event selection at much higher data rates.

This forward-looking work is considered essential to ensure that the High-Luminosity LHC can continue delivering groundbreaking scientific discoveries in particle physics over the coming decades, even as the volume of raw data grows by an order of magnitude.

IMPLICATIONS

While the wider artificial intelligence industry continues to pursue ever-larger language models that demand massive computational resources and energy, CERN is deliberately moving in the opposite direction. The laboratory is developing some of the smallest, fastest, and most efficient AI models currently in existence, optimised specifically for direct hardware implementation in FPGAs and ASICs.

This work represents a compelling real-world demonstration of “tiny AI” — highly specialised, minimal-footprint neural networks — deployed in one of the most extreme scientific environments on the planet. In the LHC’s trigger systems, where decisions must be made in nanoseconds on enormous data streams, these compact models achieve performance levels that would be unattainable with conventional general-purpose AI accelerators.

Beyond particle physics, CERN’s approach may influence the future design of high-performance computing systems in other domains that require real-time, ultra-low-latency inference under extreme data rates. Applications in autonomous systems, high-frequency trading, medical imaging, and aerospace could benefit from similar hardware-embedded, resource-efficient AI techniques. As global demand for both computing power and energy efficiency continues to grow, the CERN model offers a practical alternative to the current trend of scaling up model size, highlighting the value of extreme specialisation and hardware-level optimisation.

PRIMARY SOURCES

Source	Description	Verification Link
CERN Twiki	AXOL1TL V5 architecture and deployment details (VICReg-trained feature extractor + VAE for anomaly detection)	https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025
arXiv Paper	Real-time Anomaly Detection at the L1 Trigger of CMS Experiment (introduces AXOL1TL and CICADA algorithms)	https://arxiv.org/abs/2411.19506 (or html version: https://arxiv.org/html/2411.19506v1)
CERN Official	General LHC data processing and trigger systems	https://home.cern/science/computing
Thea Aarrestad Talks	Overview of tiny AI / ML for LHC triggers (highly recommended for context)
https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC)
https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the LHC)

NEWS FILED BY: John

Thats a bit far. Relu does check x>0 but thats just one non-linearity in the linear/non-linear sandwich that makes up universal function approximator theorem. Its more conplex than just x>0

Multiply-accumulate, then clamp negative values to zero. Every even-numbered variable is a weighted sum plus a bias (an affine transformation), and every odd-numbered variable is the ReLU gate (max(0, x)). Layer 2 feeds on the ReLU outputs of layer 1, and the final output is a plain linear combination of the last ReLU outputs

    // inputs: u, v
    // --- hidden layer 1 (3 neurons) ---
    let v0  = 0.616*u + 0.291*v - 0.135
    let v1  = if 0 > v0 then 0 else v0
    let v2  = -0.482*u + 0.735*v + 0.044
    let v3  = if 0 > v2 then 0 else v2
    let v4  = 0.261*u - 0.553*v + 0.310
    let v5  = if 0 > v4 then 0 else v4
    // --- hidden layer 2 (2 neurons) ---
    let v6  = 0.410*v1 - 0.378*v3 + 0.528*v5 + 0.091
    let v7  = if 0 > v6 then 0 else v6
    let v8  = -0.194*v1 + 0.617*v3 - 0.291*v5 - 0.058
    let v9  = if 0 > v8 then 0 else v8
    // --- output layer (binary classification) ---
    let v10 = 0.739*v7 - 0.415*v9 + 0.022
    // sigmoid squashing v10 into the range (0, 1)
    let out = 1 / (1 + exp(-v10))

That definition moves the goalposts almost by definition, people only stopped thinking that chess demonstrated intelligence when computers started doing it.

The term artificial intelligence has always been just a buzzword designed to sell whatever it needed to. IMHO, it has no meaningful value outside of a good marketing term. John McCarthy is usually the person who is given credit for coming up with the name and he has admitted in interviews that it was just to get eyeballs for funding.

But I might be way off here and the real challenges are with other aspects beyond the tools.

Thank you for the comment, and the questions are great.

The problems you described here are pretty much precise. In the past, and mostly now, we are replying on the commercial Vivado/Vitis HLS toolchains for the deployment of these networks through hls4ml, a template based compiler of the quantized models to the HLS projects. For this class of fully parallel (II=1) models, the tools usually give fine results, but indeed can be wrong sometimes (great recent example from our college's post: https://sioni.web.cern.ch/2026/03/24/debugging-fastml).

Tool runtime is another issue. For the models discussed in this post, they are not larger than ~30K LUTs, and with the low complexity (~dense only), synthesis time was fine. But for larger ones, like the ones here (https://arxiv.org/abs/2510.24784), it can take up to... a week for one HLS compilation while eating ~80G ram. Can get worse if time multiplex is in place things like #pragma HLS dataflow is used...

Personally, I do not usually DSE on post implementation/HLS results, since for the unrolled logic blocks, ok-ish performance model can be derived obtained w/o doing the synthesis (via ebops defined in HGQ, or better if using heuristics based on the rough cost of low level operations the design will translate to). But there are works doing DSE based on post HLS results (https://arxiv.org/pdf/2502.05850, real vitis synth), or using some other surrogate to get over the problem (e.g., https://arxiv.org/abs/2501.05515, using bops). High-level surrogate models are also being developed (https://arxiv.org/pdf/2511.05615).

We are also trying to get alternatives to the commercial HLS toolflows. For instance, I'm working on the direct to RTL codegen (da4ml) way (optionally via XLS), and the current work-in-progress is at https://github.com/calad0i/da4ml/tree/dev, if you are interested: all combinational or fully pipelined things are supported with reasonable performance model (~10% err in LUTs and ~20% err in latency), but multicycle, or stateful design generations still need a lot of manual intervention (not automated), which are to be implemented in the future. Since at some stages of the trigger chain, the system is/will be time-multiplexed, such functionality will be needed in the future.

Other works on this direction includes adding new backends to hls4ml that are oos (e.g., openhls/XLS), or other alternatives like chisel4ml (https://github.com/cs-jsi/chisel4ml). Hopefully, we will be no-longer reliant on the commercial tools till RTL for the incoming upgrade. That being said, Vivado still appears to be the only choice for the post RTL stages for us.