Thursday August 28 2025

Hacker Times

Home New Best Show About Trends

GMP damaging Zen 5 CPUs?

gmplib.org

209 33

Discussion (33 comments)

Listen to this article (with local TTS)

GMP damaging Zen 5 CPUs?

Background

We have fried two Ryzen 9950X CPUs in a few months by running GMP tests. This is not expected, of course. In this page, we provide as much information as possible to help analysing the problem.

What causes these CPUs to die while running GMP is unknown to us. While similar, it is not the widely reported Asrock motherboard problem, as the motherboards we use are of a different make (see below). It might be that the Zen 5 CPUs pull more power than specified when running GMP, or it might be that our cooling solutions are inadequate.

The 1st failure happened in late winter 2025 (February IIRC). The system was located in a dedicated room in a flat in Stockholm, Sweden. The ambient temperature in the room was low, less than 20°C.

The 2nd failure happened 2025-08-24 in a dedicated computer room, with ambient temperature just over 20°C.

After both failures, we took out the CPUs and found a discoloured area of perhaps 25mm² on the pin side. Here are two picture from different angles of the 2nd CPU:

We use a Noctua cooling solution for both systems. For the 1st system, we mounted the heat sink centred. For the 2nd system, we followed Noctua's advice of mounting things offset towards what they claim to be the hotter side of the CPU. Below is a picture of the 2nd system without the heat sink which shows that offset. Note the brackets and their pins, those pins are where the heat sink's pressure gets centred. Also note how the thermal paste has been squeezed away from that part, but is quite thick towards the left.

Here is a closeup on the heat sink, indicating how the off-centre mounting squeezed the paste to one end.

This is the cleaned front side of the CPU. Nothing much to see here, the visible damage is on its back.

System configuration

The 1st system had these components:

OS: GNU/Linux Ubuntu mbd: Asus Prime B650M-K (BIOS 3057 [flashed 2024-11-19]) cpu: AMD Zen5 X16 4300MHz (Ryzen 9950X) mem: 32768MB DDR5-4800 ECC (Samsung M324R4GA3BB0-CQK) mem: 32768MB DDR5-4800 ECC (Samsung M324R4GA3BB0-CQK) disk: NVMe SSD M.2 250GB Samsung 980 Pro case: Fractal Design Core 1100 with extra fans psu: Corsair SF450 cooler: Noctua NH-U9S

The 2nd system had these components:

OS: GNU/Linux Gentoo kernel 6.12.31 (config mbd: Asus Prime B650M-A WIFI II (BIOS 3278 [flashed 2025-08-16]) cpu: AMD Zen5 X16 4300MHz (Ryzen 9950X) mem: 49152MB DDR5-5600 ECC (Kingston KSM56E46BD8KM-48HM) mem: 49152MB DDR5-5600 ECC (Kingston KSM56E46BD8KM-48HM) disk: NVMe SSD M.2 250GB Samsung 980 Pro case: Fractal Design Core 1100 with extra fans psu: Corsair RM650 cooler: Noctua NH-U9S

Reasoning

We really don't know what is wrong here. Let us list some observations and thoughts:

The ambient temperature was in both cases quite low.
These are supposedly top-quality motherboard (We've had about 50 ASUS motherboard over the years, and only had one failure).
The offset mounting does not look right to us. But this is how things are supposed to be done.
It is possible that we caused the 2nd CPU failure by inadequate mounting of the heat sink. It surely seems bizarre to have the thermal paste squeezed like shown on the pictures above!
But note that the 1st failure happened with a more centred heat sink. We only made the off-centre mounting for the 2nd system as to minimise the risk of a repeated system failure.
The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight. We have extra fans in the cases and a low ambient temperature, making these 5W seem negligible. But of course, this is not completely kosher.
We have a sister system for the 9950X, a 7950X which is similarly configured and built. That system actually seems to run slightly hotter, but it has been stable for a long time (under the same crazy load!).
The systems were under maximal load at the time their CPUs died, running very tight handcrafted asm loops sustaining one MULX per cycle. That might be a "hot" instruction.
Did the CPUs die of heat stroke? Modern CPUs measure their temperature and clock down if they get too hot, don't they?
We don't overclock or overvolt or play other teen games with our hardware.
The extremely thin layer of thermal paste which resulted from the off-centre heat sink mount, might seem fine. We however suspect that there might be a problem, as when the system heats and cools, things bend a little. With a remaining layer of thermal paste, such bending will not create any void as the paste's elasticity will maintain some contact. When the thermal paste is squeezed away as with the suggested Noctua mounting, an ever-so-slight void between the CPU and the heat sink might be created.
Neither of the 9950X CPUs died immediately, instead they died the exact same way after a couple of months at high load. This seems to suggest a gradual but predictable degradation.