Intel Demos Chip to Compute with Encrypted Data

Perhaps it's a cynical way to look at it, but in the days of the war on general purpose computing, and locked-down devices, I have to consider the news in terms of how it could be used against the users and device owners. I don't know enough to provide useful analysis so I won't try, but instead pose as questions to the much smarter people who might have some interesting thoughts to share.

There are two, non-exclusive paths I'm thinking at the moment:

1. DRM: Might this enable a next level of DRM?

2. Hardware attestation: Might this enable a deeper level of hardware attestation?

First and foremost, grateful for the ability to take and give to this HN community for what HN has done for me. With that stated I am reminded near daily when reading posts on HN of my experience, my age, and some of my now lost hair color.

After nearly 3 decades of critical technology systems architecture and management involving ongoing industry audits my experience and age knows why my hair has lost some of its color. Much of that lost color comes from security management of third party systems, yes the old dreaded dependencies. Elimination of those third parties is key for one's cyber sanity and hair color yet with technology still in its infancy some cannot distinguish the forest from the trees.

Nothing remains the same as progress moves forward correcting for past mistakes while learning what works and does not along that journey, technology platforms are no exception. Analogously early automobiles lacked safety features as well such as windshield wipers and seatbelts so has the passage of time proved their addition to be valued? Few people today truly understand how things work as nearly all just want the instant fix "pill" to alleviate their issues however this approach cannot work with security. True security is designed in from the foundation and such secure platforms go unseen yet we have an endless list of victims from those insecure systems which have "bolted on" security after the fact. This security change and more is coming to system designs as the entire world is now fully aware of cyber security, or in this case, the lack of it.

Time, the young fail to consider it up until a single moment in their life, while the old reflect on where theirs went. After the reflection of one's time however change becomes obvious.

> Heracles, which sped up FHE computing tasks as much as 5,000-fold compared to a top-of the-line Intel server CPU.

That is nice speed-up compared to generic hardware but everyone probably wants to know how much slower it is than performing same operations on plain text data? I am sure 50% penalty is acceptable, 95% is probably not.

Someone explain how you'd create a vector embedding using homomorphically encrypted data, without decrypting it. Seems like a catch 22. You don't get to know the semantic meaning, but need the semantic meaning to position it in high dimensional space. I guess the point I'm making is that sure, you can sell compute for FHE, but you quickly run up against a hard limit on any value added SaaS you can provide the customer. This feels like a solution that's being shoehorned in because cloud providers really really really want to have a customer use their data center, when in truth the best solution would be a secure facility for the customer so that applications can actually understand the data they're working with.

FHE is the future of AI. I predict local models with encrypted weights will become the norm. Both privacy preserving (insofar as anything on our devices can be) and locked down to prevent misuse. It may not be pretty but I think this is where we will end up.

This is incredible work.. And makes the technology absolutely viable.

However... In a world where privacy is constantly being eroded intentionally by governments and private companies, I think this will NEVER, ever reach any consumer grade hardware. My cynic could envision the technology export ban worldwide in the vein of RSA [0] .

Why would any company offer the customers real out of the box e2e encryption possibilities built into their devices.

DRM was mentioned by another user. This will not be used to enable privacy for the masses.

https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...

This is a huge win for cybersecurity and data privacy.

If they can get this shrunk down and efficient enough in a future scenario I think Apple could move back to Intel for this with their stance on encryption and things it being a pillar of their image.

One thing I'm curious about is whether this could change how cloud providers handle sensitive workloads.

If computation can happen directly on encrypted data, does that reduce the need for trusted environments like SGX/TEE, or does it mostly complement them?

Everything about this in my head screams "bad idea".

If you need to trust the encryption and trust the hardware itself, it may not be suitable for your environment/ threat model.

There are two, non-exclusive paths I'm thinking at the moment:

1. DRM: Might this enable a next level of DRM?

2. Hardware attestation: Might this enable a deeper level of hardware attestation?

Just to level set here. I think its important to realize this is really focused on allowing things like search to operate on encrypted data. This technique allows you to perform an operation on the data without decrypting it. Think a row in a database with email, first, last, and mailing address. You want to search by email to retrieve the other data, but don't want that data unencrypted since it is PII.

In general, this solution would be expensive and targeted at data lakes, or areas where you want to run computation but not necessarily expose the data.

With regard to DRM, one key thing to remember is that it has to be cheap, and widely deployable. Part of the reason dvds were easily broken is that the algorithm chosen was inexpensive both computationally, so you can install it on as many clients as possible.

> how it could be used against the users and device owners

Same here.

Can't wait to KYC myself in order to use a CPU.

> how it could be used against the users

We are not anymore their clients, we are just another product to sell. So, they do not design chips for us but for the benefit of other corporations.

3. Unskippable ads with data gathering at the CPU level.

I don't think it's applicable to DRM because you eventually need the decrypted content: DRM is typically used for books, music, video, etc., you can't enjoy an encrypted video.

I think eGovernment is the main use case: not super high traffic (we're not voting every day), but very high privacy expectations.

I'm also thinking of what happens when quantum computing becomes available.

But when homomorphic encryption becomes efficient, perhaps governments can force companies to apply it (though they would lose their opportunity for backdooring, but E2EE is a thing too so I wouldn't worry too much).

1. The private key is required to see anything computed under FHE, so DRM is pretty unlikely.

2. No, anyone can run the FHE computations anywhere on any hardware if they have the evaluation key (which would also have to be present in any FHE hardware).

See: https://news.ycombinator.com/item?id=47323743

It's not related to DRM or trusted computing.

No, because of the fundamental limitation of DRM. Content must be delivered as plaintext.

My thought is half cynical. As LLM crawlers seek to mop up absolutely everything, companies themselves start to worry more about keeping their own data secret. Maybe this is a reason for shifts like this; as encrypted and other privacy-preserving products become more in demand across the board.

In general, this solution would be expensive and targeted at data lakes, or areas where you want to run computation but not necessarily expose the data.

This is a huge win for cybersecurity and data privacy.

> Heracles, which sped up FHE computing tasks as much as 5,000-fold compared to a top-of the-line Intel server CPU.

There are applications that are currently doing this without hardware support and accepting much worse than 95% performance loss to do so.

This hardware won’t make the technique attractive for ALL computation. But, it could dramatically increase the range of applications.

Current FHE on general CPUs is typically 10,000x to 100,000x slower than plaintext, depending on the scheme and operation. So even with a 5,000x ASIC speedup you are still looking at roughly 20-100x overhead vs unencrypted compute.

That rules out anything latency-sensitive, but for batch workloads like aggregating encrypted medical records or running simple ML inference on private data it starts to become practical. The real unlock is not raw speed parity but getting FHE fast enough that you can justify the privacy tradeoff for specific regulated workloads.

Now we know why Intel more or less abandonned SEAL and rejected GPU requests.

Time, the young fail to consider it up until a single moment in their life, while the old reflect on where theirs went. After the reflection of one's time however change becomes obvious.

How does this relate to chip based homomorphic encryption? Just curious.

If they can get this shrunk down and efficient enough in a future scenario I think Apple could move back to Intel for this with their stance on encryption and things it being a pillar of their image.

Not going to happen anytime soon, as the modern M4/ARM unified memory with on-chip GPU is years ahead of Intel. The software ecosystem is slowly growing to leverage this chip architecture, and due to the annoying PC RAM, SSD, and RTX GPU shenanigans it is no longer the lower value option.

The PC market was made shitty enough this year, that Mid/High class Mac Pro/laptops are actually often a better value deal now (if and only if your use-case is covered software wise.)

Intel does plan on a RTX + amd64 SoC soon, but still pooched the memory interface with a 30 year old mailbox kludge. Intel probably wont survive this choice without bailouts. =3

This is incredible work.. And makes the technology absolutely viable.

Why would any company offer the customers real out of the box e2e encryption possibilities built into their devices.

DRM was mentioned by another user. This will not be used to enable privacy for the masses.

https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...

Everything about this in my head screams "bad idea".

If you need to trust the encryption and trust the hardware itself, it may not be suitable for your environment/ threat model.

One thing I'm curious about is whether this could change how cloud providers handle sensitive workloads.

If computation can happen directly on encrypted data, does that reduce the need for trusted environments like SGX/TEE, or does it mostly complement them?

I'm also thinking of what happens when quantum computing becomes available.

No, because of the fundamental limitation of DRM. Content must be delivered as plaintext.

> In a world where privacy is constantly being eroded intentionally by governments and private companies, I think this will NEVER, ever reach any consumer grade hardware.

Why not when government can just force companies to backdoor their hardware for them. That way users are secure most of the time except from the government (until the backdoor in intel's chips gets discovered anyway), and users have a false sense of security/privacy so people are more likely to share their secrets with corporations and the government gets to spy on people communicating more openly with each other.

Arguably this is less useful for consumer hardware in the first place. This is mostly useful when I don’t trust the service provider with my data but still need to use their services (casting my vote, encrypted inference, and so forth)

Most of modern machine learning is effectively linear algebra. We can achieve semantic search over encrypted vectors if the encryption relies on similar principles.

If you're interested in "private AI", see Confer [0] by Moxie Marlinspike, the founder of Signal private messaging app. They go into more detail in their blog. [1]

[0] https://confer.to/

[1] https://confer.to/blog/2025/12/confessions-to-a-data-lake/

If encrypted outputs can be viewed or used, they can be reverse-engineered through that same interface. FHE shifts the attack surface, it does not eliminate it.

FHE is impractical by all means. Either it's trivially broken and unsecured or the space requirements go beyond anything usable.

There is basically no business demand beside from sellers and scholars.

It is a bad idea but not in the way you think. FHE hardware don't decrypt data on-chip. It's like using the Diffie-Hellman key exchange for general computation. The data and operations stay encrypted at any given moment while outside your client device.

The textbook example application of FHE is phone book search. The server "multiply" the whole phonebook database file with your encrypted query, and sends back the whole database file to you every time regardless of queries. When you decrypt the file with the key used to encrypt the query, the database is all corrupt and garbled except for the rows matching the query, thereby causing the search to have practically occurred. The only information that exists in the clear are query and the size of entire database.

Sounds fantastically energy-efficient, no? That's the problem with FHE, not risks of backdooring.

In FHE the hardware running it don't know the secrets. That's the point.

First you encrypt the data. Then you send it to hardware to compute, get result back and decrypt it.

>If you need to trust the encryption and trust the hardware itself, it may not be suitable for your environment/ threat model.

Are we reading the same article? It's talking about homorphic encryption, ie. doing mathematical operations on already encrypted data, without being aware of its cleartext contents. It's not related to SGX or other trusted computing technologies.

In theory you only need to trust the hardware to be correct, since it doesn't have the decryption key the worst it can do is give you a wrong answer. In theory.

If it were as fast as a normal chip, it would obviate the need

> how it could be used against the users and device owners

Same here.

Can't wait to KYC myself in order to use a CPU.

KYC = Kill Your Conscience

It's truly amazing how modern people just blithely sacrifice their privacy and integrity for no good reason. Just to let big tech corporations more efficiently siphon money out of the market. And then they fight you passionately when you call out those companies for being unnecessarily invasive and intrusive.

The four horsemen of the infocalypse are such profoundly reliable boogeymen, we really need a huge psychological study across all modern cultures to see why they're so effective at dismantling rational thought in the general public, and how we can innoculate society against it without damaging other important social behaviors.

1. The private key is required to see anything computed under FHE, so DRM is pretty unlikely.

2. No, anyone can run the FHE computations anywhere on any hardware if they have the evaluation key (which would also have to be present in any FHE hardware).

HDCP does some of that already in many of your devices.

I don't think it's applicable to DRM because you eventually need the decrypted content: DRM is typically used for books, music, video, etc., you can't enjoy an encrypted video.

I think eGovernment is the main use case: not super high traffic (we're not voting every day), but very high privacy expectations.

Yes it must be decrypted eventually, but I've read about systems (I think HDMI does this) where the keys are stored in the end device (like the TV or monitor) that the user can't access. Given that we already have that, I think I agree that this news doesn't change anything, but I wonder if there are clever uses I haven't thought of

See: https://news.ycombinator.com/item?id=47323743

It's not related to DRM or trusted computing.

> how it could be used against the users

We are not anymore their clients, we are just another product to sell. So, they do not design chips for us but for the benefit of other corporations.

3. Unskippable ads with data gathering at the CPU level.

I distinctly remember from university in one of my more senior classes designing logic gates, chaining together ands, nands, ors, nors, xors, and then working our way up to numerical processors, ALUs, and eventually latches, RAM, and CPUs. The capstone was creating an assembly to control it all.

I remember how thinking how fun it was! I could see unfolded before me how there would be endless ways to configure, reconfigure, optimize, etc.

I know there are a few open source chip efforts, but wondering maybe now is the time to pull the community together and organize more intentionally around that. Maybe open source chipsets won't be as fast as their corporate counterparts, but I think we are definitely at an inflection point now in society where we would need this to maintain freedom.

If anyone is working in that area, I am very interested. I am very green, but still have the old textbooks I could dust off (just don't have the ole college provided mentor graphics -- or I guess siemens now -- design tool anymore).

There are applications that are currently doing this without hardware support and accepting much worse than 95% performance loss to do so.

This hardware won’t make the technique attractive for ALL computation. But, it could dramatically increase the range of applications.

Agreed. When I was working on TEEs/confidential computing, just about everyone agreed that FHE was conceptually attractive (trust the math instead of trusting a hardware vendor) but the overhead of FHE was so insanely high. Think 1000x slowdowns turning your hour-long batch job into something that takes over a month to run instead.

10,000x to 100,000x / 5,000x = 2 to 10x, not 20 to 100x.

HDCP does some of that already in many of your devices.

How does this relate to chip based homomorphic encryption? Just curious.

Now we know why Intel more or less abandonned SEAL and rejected GPU requests.

The PC market was made shitty enough this year, that Mid/High class Mac Pro/laptops are actually often a better value deal now (if and only if your use-case is covered software wise.)

Intel does plan on a RTX + amd64 SoC soon, but still pooched the memory interface with a 30 year old mailbox kludge. Intel probably wont survive this choice without bailouts. =3

KYC = Kill Your Conscience

Rent out your spare compute, like seti@home or folding@home, but it’s something someone could repackage and sell as a service.

> In a world where privacy is constantly being eroded intentionally by governments and private companies, I think this will NEVER, ever reach any consumer grade hardware.

FHE is impractical by all means. Either it's trivially broken and unsecured or the space requirements go beyond anything usable.

There is basically no business demand beside from sellers and scholars.

Most of modern machine learning is effectively linear algebra. We can achieve semantic search over encrypted vectors if the encryption relies on similar principles.

>If you need to trust the encryption and trust the hardware itself, it may not be suitable for your environment/ threat model.

Sounds fantastically energy-efficient, no? That's the problem with FHE, not risks of backdooring.

If it were as fast as a normal chip, it would obviate the need

What does that even mean?

A: "Intel/AMD is adding instructions to accelerate AES"

B: "Might this enable a next level of DRM? Might this enable a deeper level of hardware attestation?"

A: "wtf are you talking about? It's just instructions to make certain types of computations faster, it has nothing to do with DRM or hardware attestation."

B: "Not yet."

I'm sure in some way it probably helps DRM or hardware attestation to some extent, but not any more than say, 3nm process node helps DRM or hardware attestation by making it faster.

I remember how thinking how fun it was! I could see unfolded before me how there would be endless ways to configure, reconfigure, optimize, etc.

If you're interested in "private AI", see Confer [0] by Moxie Marlinspike, the founder of Signal private messaging app. They go into more detail in their blog. [1]

[0] https://confer.to/

[1] https://confer.to/blog/2025/12/confessions-to-a-data-lake/

In theory you only need to trust the hardware to be correct, since it doesn't have the decryption key the worst it can do is give you a wrong answer. In theory.

If encrypted outputs can be viewed or used, they can be reverse-engineered through that same interface. FHE shifts the attack surface, it does not eliminate it.

In FHE the hardware running it don't know the secrets. That's the point.

First you encrypt the data. Then you send it to hardware to compute, get result back and decrypt it.

10,000x to 100,000x / 5,000x = 2 to 10x, not 20 to 100x.

There's no point. The big chip makers control all the billion dollar fabs. Governments and corporations can easily dictate terms. We'll lose this battle unless we develop a way to cheaply fabricate chips in a garage.

The future is bleak.

I was just thinking about this a few days ago, but not just for the CPU (which we have RISC-V and OpenPOWER), but for an entire system, including the GPU, audio, disk controllers, networking, etc. I think a great target would be mid-2000s graphics and networking; I could go back to a 2006 Mac Pro without too much hardship. Having a fully-open equivalent to mid-2000s hardware would be a boon for open computing.

Sounds like you might want to go play with RISC-V, either in hardware or emulation.

True, in the case of casting a vote though for example, I would see it being used within the voting machines itself before sending off to be counted. Good application.

But getting them available for customers for example say even a PCIe card or something and then that automatically encrypting everything you ever run today over an encrypted connection would be a dream.

I don't get how this can work, and Moxie (or rather his LLM) never bothers to explain. How can an LLM possibly exchange encrypted text with the user without decrypting it?

The correct solution isn't yet another cloud service, but rather local models.

But can you trust the hardware encryption to not be backdoored, by design?

That's my point, this sounds like a way to create a backdoor for at-rest data.

If you know how to reverse engineer weights or even hidden states through simple text output without logprobs I’d be interested in hearing about it. I imagine a lot of other people would be too.

I mean, no they cannot be viewed at any point once encrypted unless you have the key. That's the point. Even the intermediate steps are random gibberish unless you have the key

But you leak all type of information and and the retrieve either leak even more data or you'll end up with transferring a god knows amount of data or your encryption is trivially broken or spend days/month/years to unencrypt.

Rent out your spare compute, like seti@home or folding@home, but it’s something someone could repackage and sell as a service.

What does that even mean?

A: "Intel/AMD is adding instructions to accelerate AES"

B: "Might this enable a next level of DRM? Might this enable a deeper level of hardware attestation?"

A: "wtf are you talking about? It's just instructions to make certain types of computations faster, it has nothing to do with DRM or hardware attestation."

B: "Not yet."

I'm sure in some way it probably helps DRM or hardware attestation to some extent, but not any more than say, 3nm process node helps DRM or hardware attestation by making it faster.

The future is bleak.

Sounds like you might want to go play with RISC-V, either in hardware or emulation.

True, in the case of casting a vote though for example, I would see it being used within the voting machines itself before sending off to be counted. Good application.

If you know how to reverse engineer weights or even hidden states through simple text output without logprobs I’d be interested in hearing about it. I imagine a lot of other people would be too.

I mean, no they cannot be viewed at any point once encrypted unless you have the key. That's the point. Even the intermediate steps are random gibberish unless you have the key

I don't get how this can work, and Moxie (or rather his LLM) never bothers to explain. How can an LLM possibly exchange encrypted text with the user without decrypting it?

The correct solution isn't yet another cloud service, but rather local models.

The model is running in a secure enclave that spans the GPU using NVIDIA Confidential Computing: https://www.nvidia.com/en-us/data-center/solutions/confident.... The connection is encrypted with a key that is only accessible inside the enclave.

Within the enclave itself, DRAM and PCIe connections between the CPU and GPU are encrypted, but the CPU registers and the GPU onboard memory are plaintext. So the computation is happening on plaintext data, it’s just extremely difficult to access it from even the machine running the enclave.

They explain it in Private inference [0] if you want to read about it.

[0] https://confer.to/blog/2026/01/private-inference/

But can you trust the hardware encryption to not be backdoored, by design?

That's my point, this sounds like a way to create a backdoor for at-rest data.

By design, you don't trust it. You never hand out the keys so there's no secret to back door. The task is never unencrypted, at rest or otherwise.

You can if the manufacturer has a track record that refutes the notion, and especially if they have verifiable hardware matching publicly disclosed circuit designs. But this is Intel, with their track record, I wouldn't trust it even if the schematics were public. Intel ME not being disable-able by consumers, while being entirely omitted for certain classes of government buyers tells me everything I need to know.

Well yeah... You do the initial encryption yourself by whatever means you trust

> That's my point, this sounds like a way to create a backdoor for at-rest data.

I get the feeling honestly it seems more expensive and more effort to backdoor it..

Math literacy needs to become standard for computer scientists. These takes are so bad

I don't know how you got these ideas but when you crack it, do make sure to write a post about it. Can't wait for that writeup.

Summary

Fully homomorphic encryption (FHE) allows computing on encrypted data without decryption, but it’s currently slow on standard CPUs and GPUs.
Intel’s Heracles chip accelerates FHE tasks up to 5,000 times faster than top Intel server CPUs.
Heracles uses a 3-nanometer FinFET technology and high-bandwidth memory, enabling efficient encrypted computing at scale.
Startups and Intel are racing to commercialize FHE accelerators, with potential applications in AI and secure data processing.

Worried that your latest ask to a cloud-based AI reveals a bit too much about you? Want to know your genetic risk of disease without revealing it to the services that compute the answer?

There is a way to do computing on encrypted data without ever having it decrypted. It’s called fully homomorphic encryption, or FHE. But there’s a rather large catch. It can take thousands—even tens of thousands—of times longer to compute on today’s CPUs and GPUs than simply working with the decrypted data.

So universities, startups, and at least one processor giant have been working on specialized chips that could close that gap. Last month at the IEEE International Solid-State Circuits Conference (ISSCC) in San Francisco, Intel demonstrated its answer, Heracles, which sped up FHE computing tasks as much as 5,000-fold compared to a top-of the-line Intel server CPU.

Startups are racing to beat Intel and each other to commercialization. But Sanu Mathew, who leads security circuits research at Intel, believes the CPU giant has a big lead, because its chip can do more computing than any other FHE accelerator yet built. “Heracles is the first hardware that works at scale,” he says.

The scale is measurable both physically and in compute performance. While other FHE research chips have been in the range of 10 square millimeters or less, Heracles is about 20 times that size and is built using Intel’s most advanced, 3-nanometer FinFET technology. And it’s flanked inside a liquid-cooled package by two 24-gigabyte high-bandwidth memory chips—a configuration usually seen only in GPUs for training AI.

In terms of scaling compute performance, Heracles showed muscle in live demonstrations at ISSCC. At its heart the demo was a simple private query to a secure server. It simulated a request by a voter to make sure that her ballot had been registered correctly. The state, in this case, has an encrypted database of voters and their votes. To maintain her privacy, the voter would not want to have her ballot information decrypted at any point; so using FHE, she encrypts her ID and vote and sends it to the government database. There, without decrypting it, the system determines if it is a match and returns an encrypted answer, which she then decrypts on her side.

On an Intel Xeon server CPU, the process took 15 milliseconds. Heracles did it in 14 microseconds. While that difference isn’t something a single human would notice, verifying 100 million voter ballots adds up to more than 17 days of CPU work versus a mere 23 minutes on Heracles.

Looking back on the five-year journey to bring the Heracles chip to life, Ro Cammarota, who led the project at Intel until last December and is now at University of California Irvine, says “we have proven and delivered everything that we promised.”

FHE Data Expansion

FHE is fundamentally a mathematical transformation, sort of like the Fourier transform. It encrypts data using a quantum-computer-proof algorithm, but, crucially, uses corollaries to the mathematical operations usually used on unencrypted data. These corollaries achieve the same ends on the encrypted data.

One of the main things holding such secure computing back is the explosion in the size of the data once it’s encrypted for FHE, Anupam Golder, a research scientist at Intel’s circuits research lab, told engineers at ISSCC. “Usually, the size of cipher text is the same as the size of plain text, but for FHE it’s orders of magnitude larger,” he said.

While the sheer volume is a big problem, the kinds of computing you need to do with that data is also an issue. FHE is all about very large numbers that must be computed with precision. While a CPU can do that, it’s very slow going—integer addition and multiplication take about 10,000 more clock cycles in FHE. Worse still, CPUs aren’t built to do such computing in parallel. Although GPUs excel at parallel operations, precision is not their strong suit. (In fact, from generation to generation, GPU designers have devoted more and more of the chip’s resources to computing less and less-precise numbers.)

FHE also requires some oddball operations with names like “twiddling” and “automorphism,” and it relies on a compute-intensive noise-cancelling process called bootstrapping. None of these things are efficient on a general-purpose processor. So, while clever algorithms and libraries of software cheats have been developed over the years, the need for a hardware accelerator remains if FHE is going to tackle large-scale problems, says Cammarota.

The Labors of Heracles

Heracles was initiated under a DARPA program five years ago to accelerate FHE using purpose-built hardware. It was developed as “a whole system-level effort that went all the way from theory and algorithms down to the circuit design,” says Cammarota.

Among the first problems was how to compute with numbers that were larger than even the 64-bit words that are today a CPU’s most precise. There are ways to break up these gigantic numbers into chunks of bits that can be calculated independently of each other, providing a degree of parallelism. Early on, the Intel team made a big bet that they would be able to make this work in smaller, 32-bit chunks, yet still maintain the needed precision. This decision gave the Heracles architecture some speed and parallelism, because the 32-bit arithmetic circuits are considerably smaller than 64-bit ones, explains Cammarota.

At Heracles’ heart are 64 compute cores—called tile-pairs—arranged in an eight-by-eight grid. These are what are called single instruction multiple data (SIMD) compute engines designed to do the polynomial math, twiddling, and other things that make up computing in FHE and to do them in parallel. An on-chip 2D mesh network connects the tiles to each other with wide, 512 byte, buses.

Important to making encrypted computing efficient is feeding those huge numbers to the compute cores quickly. The sheer amount of data involved meant linking 48-GB-worth of expensive high-bandwidth memory to the processor with 819 GB per second connections. Once on the chip, data musters in 64 megabytes of cache memory—somewhat more than an Nvidia Hopper-generation GPU. From there it can flow through the array at 9.6 terabytes per second by hopping from tile-pair to tile-pair.

To ensure that computing and moving data don’t get in each other’s way, Heracles runs three synchronized streams of instructions simultaneously, one for moving data onto and off of the processor, one for moving data within it, and a third for doing the math, Golder explained.

It all adds up to some massive speed ups, according to Intel. Heracles—operating at 1.2 gigahertz—takes just 39 microseconds to do FHE’s critical math transformation, a 2,355-fold improvement over an Intel Xeon CPU running at 3.5 GHz. Across seven key operations, Heracles was 1,074 to 5,547 times as fast.

The differing ranges have to do with how much data movement is involved in the operations, explains Mathew. “It’s all about balancing the movement of data with the crunching of numbers,” he says.

FHE Competition

“It’s very good work,” Kurt Rohloff, chief technology officer at FHE software firm Duality Technology, says of the Heracles results. Duality was part of a team that developed a competing accelerator design under the same DARPA program that Intel conceived Heracles under. “When Intel starts talking about scale, that usually carries quite a bit of weight.”

Duality’s focus is less on new hardware than on software products that do the kind of encrypted queries that Intel demonstrated at ISSCC. At the scale in use today “there’s less of a need for [specialized] hardware,” says Rohloff. “Where you start to need hardware is emerging applications around deeper machine-learning oriented operations like neural net, LLMs, or semantic search.”

Last year, Duality demonstrated an FHE-encrypted language model called BERT. Like more famous LLMs such as ChatGPT, BERT is a transformer model. However it’s only one tenth the size of even the most compact LLMs.

John Barrus, vice president of product at Dayton, Ohio-based Niobium Microsystems, an FHE chip startup spun out of another DARPA competitor, agrees that encrypted AI is a key target of FHE chips. “There are a lot of smaller models that, even with FHE’s data expansion, will run just fine on accelerated hardware,” he says.

With no stated commercial plans from Intel, Niobium expects its chip to be “the world’s first commercially viable FHE accelerator, designed to enable encrypted computations at speeds practical for real-world cloud and AI infrastructure.” Although it hasn’t announced when a commercial chip will be available, last month the startup revealed that it had inked a deal worth 10 billion South Korean won (US $6.9 million) with Seoul-based chip design firm Semifive to develop the FHE accelerator for fabrication using Samsung Foundry’s 8-nanometer process technology.

Other startups including Fabric Cryptography, Cornami, and Optalysys have been working on chips to accelerate FHE. Optalysys CEO Nick New says Heracles hits about the level of speedup you could hope for using an all-digital system. “We’re looking at pushing way past that digital limit,” he says. His company’s approach is to use the physics of a photonic chip to do FHE’s compute-intensive transform steps. That photonics chip is on its seventh generation, he says, and among the next steps is to 3D integrate it with custom silicon to do the non-transform steps and coordinate the whole process. A full 3D-stacked commercial chip could be ready in two or three years, says New.

While competitors develop their chips, so will Intel, says Mathew. It will be improving on how much the chip can accelerate computations by fine tuning the software. It will also be trying out more massive FHE problems, and exploring hardware improvements for a potential next generation. “This is like the first microprocessor… the start of a whole journey,” says Mathew.