There are two, non-exclusive paths I'm thinking at the moment:
1. DRM: Might this enable a next level of DRM?
2. Hardware attestation: Might this enable a deeper level of hardware attestation?
In general, this solution would be expensive and targeted at data lakes, or areas where you want to run computation but not necessarily expose the data.
With regard to DRM, one key thing to remember is that it has to be cheap, and widely deployable. Part of the reason dvds were easily broken is that the algorithm chosen was inexpensive both computationally, so you can install it on as many clients as possible.
That is nice speed-up compared to generic hardware but everyone probably wants to know how much slower it is than performing same operations on plain text data? I am sure 50% penalty is acceptable, 95% is probably not.
After nearly 3 decades of critical technology systems architecture and management involving ongoing industry audits my experience and age knows why my hair has lost some of its color. Much of that lost color comes from security management of third party systems, yes the old dreaded dependencies. Elimination of those third parties is key for one's cyber sanity and hair color yet with technology still in its infancy some cannot distinguish the forest from the trees.
Nothing remains the same as progress moves forward correcting for past mistakes while learning what works and does not along that journey, technology platforms are no exception. Analogously early automobiles lacked safety features as well such as windshield wipers and seatbelts so has the passage of time proved their addition to be valued? Few people today truly understand how things work as nearly all just want the instant fix "pill" to alleviate their issues however this approach cannot work with security. True security is designed in from the foundation and such secure platforms go unseen yet we have an endless list of victims from those insecure systems which have "bolted on" security after the fact. This security change and more is coming to system designs as the entire world is now fully aware of cyber security, or in this case, the lack of it.
Time, the young fail to consider it up until a single moment in their life, while the old reflect on where theirs went. After the reflection of one's time however change becomes obvious.
However... In a world where privacy is constantly being eroded intentionally by governments and private companies, I think this will NEVER, ever reach any consumer grade hardware. My cynic could envision the technology export ban worldwide in the vein of RSA [0] .
Why would any company offer the customers real out of the box e2e encryption possibilities built into their devices.
DRM was mentioned by another user. This will not be used to enable privacy for the masses.
https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...
If you need to trust the encryption and trust the hardware itself, it may not be suitable for your environment/ threat model.
If computation can happen directly on encrypted data, does that reduce the need for trusted environments like SGX/TEE, or does it mostly complement them?
But when homomorphic encryption becomes efficient, perhaps governments can force companies to apply it (though they would lose their opportunity for backdooring, but E2EE is a thing too so I wouldn't worry too much).
Same here.
Can't wait to KYC myself in order to use a CPU.
2. No, anyone can run the FHE computations anywhere on any hardware if they have the evaluation key (which would also have to be present in any FHE hardware).
I think eGovernment is the main use case: not super high traffic (we're not voting every day), but very high privacy expectations.
It's not related to DRM or trusted computing.
We are not anymore their clients, we are just another product to sell. So, they do not design chips for us but for the benefit of other corporations.
3. Unskippable ads with data gathering at the CPU level.
This hardware won’t make the technique attractive for ALL computation. But, it could dramatically increase the range of applications.
That rules out anything latency-sensitive, but for batch workloads like aggregating encrypted medical records or running simple ML inference on private data it starts to become practical. The real unlock is not raw speed parity but getting FHE fast enough that you can justify the privacy tradeoff for specific regulated workloads.
The PC market was made shitty enough this year, that Mid/High class Mac Pro/laptops are actually often a better value deal now (if and only if your use-case is covered software wise.)
Intel does plan on a RTX + amd64 SoC soon, but still pooched the memory interface with a 30 year old mailbox kludge. Intel probably wont survive this choice without bailouts. =3
It's truly amazing how modern people just blithely sacrifice their privacy and integrity for no good reason. Just to let big tech corporations more efficiently siphon money out of the market. And then they fight you passionately when you call out those companies for being unnecessarily invasive and intrusive.
The four horsemen of the infocalypse are such profoundly reliable boogeymen, we really need a huge psychological study across all modern cultures to see why they're so effective at dismantling rational thought in the general public, and how we can innoculate society against it without damaging other important social behaviors.
Why not when government can just force companies to backdoor their hardware for them. That way users are secure most of the time except from the government (until the backdoor in intel's chips gets discovered anyway), and users have a false sense of security/privacy so people are more likely to share their secrets with corporations and the government gets to spy on people communicating more openly with each other.
There is basically no business demand beside from sellers and scholars.
Are we reading the same article? It's talking about homorphic encryption, ie. doing mathematical operations on already encrypted data, without being aware of its cleartext contents. It's not related to SGX or other trusted computing technologies.
The textbook example application of FHE is phone book search. The server "multiply" the whole phonebook database file with your encrypted query, and sends back the whole database file to you every time regardless of queries. When you decrypt the file with the key used to encrypt the query, the database is all corrupt and garbled except for the rows matching the query, thereby causing the search to have practically occurred. The only information that exists in the clear are query and the size of entire database.
Sounds fantastically energy-efficient, no? That's the problem with FHE, not risks of backdooring.
I remember how thinking how fun it was! I could see unfolded before me how there would be endless ways to configure, reconfigure, optimize, etc.
I know there are a few open source chip efforts, but wondering maybe now is the time to pull the community together and organize more intentionally around that. Maybe open source chipsets won't be as fast as their corporate counterparts, but I think we are definitely at an inflection point now in society where we would need this to maintain freedom.
If anyone is working in that area, I am very interested. I am very green, but still have the old textbooks I could dust off (just don't have the ole college provided mentor graphics -- or I guess siemens now -- design tool anymore).
[1] https://confer.to/blog/2025/12/confessions-to-a-data-lake/
First you encrypt the data. Then you send it to hardware to compute, get result back and decrypt it.
A: "Intel/AMD is adding instructions to accelerate AES"
B: "Might this enable a next level of DRM? Might this enable a deeper level of hardware attestation?"
A: "wtf are you talking about? It's just instructions to make certain types of computations faster, it has nothing to do with DRM or hardware attestation."
B: "Not yet."
I'm sure in some way it probably helps DRM or hardware attestation to some extent, but not any more than say, 3nm process node helps DRM or hardware attestation by making it faster.
The future is bleak.
But getting them available for customers for example say even a PCIe card or something and then that automatically encrypting everything you ever run today over an encrypted connection would be a dream.
The correct solution isn't yet another cloud service, but rather local models.
That's my point, this sounds like a way to create a backdoor for at-rest data.
Worried that your latest ask to a cloud-based AI reveals a bit too much about you? Want to know your genetic risk of disease without revealing it to the services that compute the answer?
There is a way to do computing on encrypted data without ever having it decrypted. It’s called fully homomorphic encryption, or FHE. But there’s a rather large catch. It can take thousands—even tens of thousands—of times longer to compute on today’s CPUs and GPUs than simply working with the decrypted data.
So universities, startups, and at least one processor giant have been working on specialized chips that could close that gap. Last month at the IEEE International Solid-State Circuits Conference (ISSCC) in San Francisco, Intel demonstrated its answer, Heracles, which sped up FHE computing tasks as much as 5,000-fold compared to a top-of the-line Intel server CPU.
Startups are racing to beat Intel and each other to commercialization. But Sanu Mathew, who leads security circuits research at Intel, believes the CPU giant has a big lead, because its chip can do more computing than any other FHE accelerator yet built. “Heracles is the first hardware that works at scale,” he says.
The scale is measurable both physically and in compute performance. While other FHE research chips have been in the range of 10 square millimeters or less, Heracles is about 20 times that size and is built using Intel’s most advanced, 3-nanometer FinFET technology. And it’s flanked inside a liquid-cooled package by two 24-gigabyte high-bandwidth memory chips—a configuration usually seen only in GPUs for training AI.
RELATED: How to Compute with Data You Can’t See
In terms of scaling compute performance, Heracles showed muscle in live demonstrations at ISSCC. At its heart the demo was a simple private query to a secure server. It simulated a request by a voter to make sure that her ballot had been registered correctly. The state, in this case, has an encrypted database of voters and their votes. To maintain her privacy, the voter would not want to have her ballot information decrypted at any point; so using FHE, she encrypts her ID and vote and sends it to the government database. There, without decrypting it, the system determines if it is a match and returns an encrypted answer, which she then decrypts on her side.
On an Intel Xeon server CPU, the process took 15 milliseconds. Heracles did it in 14 microseconds. While that difference isn’t something a single human would notice, verifying 100 million voter ballots adds up to more than 17 days of CPU work versus a mere 23 minutes on Heracles.
Looking back on the five-year journey to bring the Heracles chip to life, Ro Cammarota, who led the project at Intel until last December and is now at University of California Irvine, says “we have proven and delivered everything that we promised.”
FHE is fundamentally a mathematical transformation, sort of like the Fourier transform. It encrypts data using a quantum-computer-proof algorithm, but, crucially, uses corollaries to the mathematical operations usually used on unencrypted data. These corollaries achieve the same ends on the encrypted data.
One of the main things holding such secure computing back is the explosion in the size of the data once it’s encrypted for FHE, Anupam Golder, a research scientist at Intel’s circuits research lab, told engineers at ISSCC. “Usually, the size of cipher text is the same as the size of plain text, but for FHE it’s orders of magnitude larger,” he said.
While the sheer volume is a big problem, the kinds of computing you need to do with that data is also an issue. FHE is all about very large numbers that must be computed with precision. While a CPU can do that, it’s very slow going—integer addition and multiplication take about 10,000 more clock cycles in FHE. Worse still, CPUs aren’t built to do such computing in parallel. Although GPUs excel at parallel operations, precision is not their strong suit. (In fact, from generation to generation, GPU designers have devoted more and more of the chip’s resources to computing less and less-precise numbers.)
FHE also requires some oddball operations with names like “twiddling” and “automorphism,” and it relies on a compute-intensive noise-cancelling process called bootstrapping. None of these things are efficient on a general-purpose processor. So, while clever algorithms and libraries of software cheats have been developed over the years, the need for a hardware accelerator remains if FHE is going to tackle large-scale problems, says Cammarota.
Heracles was initiated under a DARPA program five years ago to accelerate FHE using purpose-built hardware. It was developed as “a whole system-level effort that went all the way from theory and algorithms down to the circuit design,” says Cammarota.
Among the first problems was how to compute with numbers that were larger than even the 64-bit words that are today a CPU’s most precise. There are ways to break up these gigantic numbers into chunks of bits that can be calculated independently of each other, providing a degree of parallelism. Early on, the Intel team made a big bet that they would be able to make this work in smaller, 32-bit chunks, yet still maintain the needed precision. This decision gave the Heracles architecture some speed and parallelism, because the 32-bit arithmetic circuits are considerably smaller than 64-bit ones, explains Cammarota.
At Heracles’ heart are 64 compute cores—called tile-pairs—arranged in an eight-by-eight grid. These are what are called single instruction multiple data (SIMD) compute engines designed to do the polynomial math, twiddling, and other things that make up computing in FHE and to do them in parallel. An on-chip 2D mesh network connects the tiles to each other with wide, 512 byte, buses.
RELATED: Tech Keeps Chatbots From Leaking Your Data
Important to making encrypted computing efficient is feeding those huge numbers to the compute cores quickly. The sheer amount of data involved meant linking 48-GB-worth of expensive high-bandwidth memory to the processor with 819 GB per second connections. Once on the chip, data musters in 64 megabytes of cache memory—somewhat more than an Nvidia Hopper-generation GPU. From there it can flow through the array at 9.6 terabytes per second by hopping from tile-pair to tile-pair.
To ensure that computing and moving data don’t get in each other’s way, Heracles runs three synchronized streams of instructions simultaneously, one for moving data onto and off of the processor, one for moving data within it, and a third for doing the math, Golder explained.
It all adds up to some massive speed ups, according to Intel. Heracles—operating at 1.2 gigahertz—takes just 39 microseconds to do FHE’s critical math transformation, a 2,355-fold improvement over an Intel Xeon CPU running at 3.5 GHz. Across seven key operations, Heracles was 1,074 to 5,547 times as fast.
The differing ranges have to do with how much data movement is involved in the operations, explains Mathew. “It’s all about balancing the movement of data with the crunching of numbers,” he says.
“It’s very good work,” Kurt Rohloff, chief technology officer at FHE software firm Duality Technology, says of the Heracles results. Duality was part of a team that developed a competing accelerator design under the same DARPA program that Intel conceived Heracles under. “When Intel starts talking about scale, that usually carries quite a bit of weight.”
Duality’s focus is less on new hardware than on software products that do the kind of encrypted queries that Intel demonstrated at ISSCC. At the scale in use today “there’s less of a need for [specialized] hardware,” says Rohloff. “Where you start to need hardware is emerging applications around deeper machine-learning oriented operations like neural net, LLMs, or semantic search.”
Last year, Duality demonstrated an FHE-encrypted language model called BERT. Like more famous LLMs such as ChatGPT, BERT is a transformer model. However it’s only one tenth the size of even the most compact LLMs.
John Barrus, vice president of product at Dayton, Ohio-based Niobium Microsystems, an FHE chip startup spun out of another DARPA competitor, agrees that encrypted AI is a key target of FHE chips. “There are a lot of smaller models that, even with FHE’s data expansion, will run just fine on accelerated hardware,” he says.
With no stated commercial plans from Intel, Niobium expects its chip to be “the world’s first commercially viable FHE accelerator, designed to enable encrypted computations at speeds practical for real-world cloud and AI infrastructure.” Although it hasn’t announced when a commercial chip will be available, last month the startup revealed that it had inked a deal worth 10 billion South Korean won (US $6.9 million) with Seoul-based chip design firm Semifive to develop the FHE accelerator for fabrication using Samsung Foundry’s 8-nanometer process technology.
Other startups including Fabric Cryptography, Cornami, and Optalysys have been working on chips to accelerate FHE. Optalysys CEO Nick New says Heracles hits about the level of speedup you could hope for using an all-digital system. “We’re looking at pushing way past that digital limit,” he says. His company’s approach is to use the physics of a photonic chip to do FHE’s compute-intensive transform steps. That photonics chip is on its seventh generation, he says, and among the next steps is to 3D integrate it with custom silicon to do the non-transform steps and coordinate the whole process. A full 3D-stacked commercial chip could be ready in two or three years, says New.
While competitors develop their chips, so will Intel, says Mathew. It will be improving on how much the chip can accelerate computations by fine tuning the software. It will also be trying out more massive FHE problems, and exploring hardware improvements for a potential next generation. “This is like the first microprocessor… the start of a whole journey,” says Mathew.
Within the enclave itself, DRAM and PCIe connections between the CPU and GPU are encrypted, but the CPU registers and the GPU onboard memory are plaintext. So the computation is happening on plaintext data, it’s just extremely difficult to access it from even the machine running the enclave.
I get the feeling honestly it seems more expensive and more effort to backdoor it..