July 22nd, 2024

July 2024 Update on Instability Reports on Intel Core 13th/14th Gen Desktop CPUs

Intel addressed instability in Core 13th and 14th Gen processors due to high voltage. A microcode patch is being developed for mid-August release, with users awaiting testing and seeking guidance from Intel.

Read original articleLink Icon
DoubtConcernSkepticism
July 2024 Update on Instability Reports on Intel Core 13th/14th Gen Desktop CPUs

In July 2024, Intel provided an update on instability reports concerning their Core 13th and 14th Gen desktop processors. The analysis revealed that elevated operating voltage was causing instability due to a microcode algorithm issue. Intel is addressing this by delivering a microcode patch to correct the voltage requests. They are conducting further validation to ensure stability and plan to release the patch to partners in mid-August. Intel encourages customers experiencing instability with these processors to contact their Customer Support for assistance. Users in the forum expressed anticipation for testing the update with different hardware configurations and acknowledged Intel's efforts to resolve the issue. Additionally, there were queries about guidance for current processor owners until the update is available and requests for Intel to communicate with motherboard manufacturers regarding BIOS updates.

Related

Intel's woes with Core i9 CPUs crashing look worse than we thought

Intel's woes with Core i9 CPUs crashing look worse than we thought

Intel is facing issues with Core i9 CPUs, including 13th and 14th-gen models, leading to crashes and errors. Data centers are affected, raising concerns about stability and support costs. Intel's response is crucial for restoring trust.

Dev reports Intel's laptop CPUs are also suffering from crashing issues

Dev reports Intel's laptop CPUs are also suffering from crashing issues

Dev reports Intel laptop CPUs facing crashing issues, extending to 13th and 14th-Gen processors. Instability persists despite attempted fixes, impacting flagship Core i9 HX series. Reports suggest widespread degradation, raising concerns for users.

Gamers Nexus: Intel 13th, 14th Gen CPU Oxidation Claims [video]

Gamers Nexus: Intel 13th, 14th Gen CPU Oxidation Claims [video]

A YouTube video cautions against using Intel CPUs due to stability issues, lack of transparency, fabrication defects, memory instability, and reduced core multipliers. Trust in Intel products eroded, with ongoing validation challenges. Wendell from Level1Techs highlights concerns.

Intel says 13th and 14th Gen mobile CPUs are crashing

Intel says 13th and 14th Gen mobile CPUs are crashing

Intel acknowledges instability in 13th and 14th Gen mobile processors, citing different causes from desktop chips. Users advised to contact manufacturers. AMD's Ryzen 9000 launch before Intel's Arrow Lake adds pressure.

July 2024 Update on Instability Reports on Intel Core Processors

July 2024 Update on Instability Reports on Intel Core Processors

Intel addresses instability reports on Core 13th and 14th Gen processors due to voltage issues. A microcode patch is set for mid-August release after validation. Users await testing and urge communication with motherboard manufacturers.

AI: What people are saying
The comments on Intel's microcode patch for Core 13th and 14th Gen processors due to high voltage issues reveal skepticism and concerns.
  • Many doubt that the issue is solely microcode-related, suspecting deeper hardware problems.
  • There are concerns about the long-term impact on CPU performance and durability due to previous over-voltage.
  • Some believe Intel's delayed response and patch release are strategically timed around competitor reviews.
  • Users are curious about the effectiveness of the microcode patch and whether it will necessitate recalls for unfixed CPUs.
  • Questions arise about Intel's communication strategy and the transparency of their announcements.
Link Icon 29 comments
By @phire - 3 months
I find it hard to believe that it actually is a microcode issue.

Mostly because Intel has way too much motivation to pass it off as a microcode issue, as they can fix a microcode issue for free, by pushing out a patch. If it's an actual hardware issue, then Intel will be forced to actually recall all the faulty CPUs, which could cost them billions.

The other reason, is that it took them way too long to give details. If it's as simple as a buggy microcode requesting an out-of-spec voltage from the motherboard, they should have been able to diagnose the problem extremely quickly and fix it in just a few weeks. They would have detected the issue as soon as they put voltage logging on the motherboard's VRM. And according to some sources, Intel have apparently been shipping non-faulty CPUs for months now (since April, from memory), and those don't have an updated microcode.

This long delay and silence feels like they spent months of R&D trying to create a workaround, create a new voltage spec to provide the lowest voltage possible. Low enough to work around a hardware fault on as many units as possible, without too large of a performance regression, or creating new errors on other CPUs because of undervolting.

I suspect that this microcode update will only "fix" the crashes for some CPUs. My prediction is that in another month Intel will claim there are actually two completely independent issues, and reluctantly issue a recall for anything not fixed by the microcode.

By @HeliumHydride - 3 months
https://scholar.harvard.edu/files/mickens/files/theslowwinte...

"Unfortunately for John, the branches made a pact with Satan and quantum mechanics [...] In exchange for their last remaining bits of entropy, the branches cast evil spells on future genera- tions of processors. Those evil spells had names like “scaling- induced voltage leaks” and “increasing levels of waste heat” [...] the branches, those vanquished foes from long ago, would have the last laugh."

"John was terrified by the collapse of the parallelism bubble, and he quickly discarded his plans for a 743-core processor that was dubbed The Hydra of Destiny and whose abstract Platonic ideal was briefly the third-best chess player in Gary, Indiana. Clutching a bottle of whiskey in one hand and a shot- gun in the other, John scoured the research literature for ideas that might save his dreams of infinite scaling. He discovered several papers that described software-assisted hardware recovery. The basic idea was simple: if hardware suffers more transient failures as it gets smaller, why not allow software to detect erroneous computations and re-execute them? This idea seemed promising until John realized THAT IT WAS THE WORST IDEA EVER. Modern software barely works when the hardware is correct, so relying on software to correct hardware errors is like asking Godzilla to prevent Mega-Godzilla from terrorizing Japan. THIS DOES NOT LEAD TO RISING PROP- ERTY VALUES IN TOKYO. It’s better to stop scaling your transistors and avoid playing with monsters in the first place, instead of devising an elaborate series of monster checks- and-balances and then hoping that the monsters don’t do what monsters are always going to do because if they didn’t do those things, they’d be called dandelions or puppy hugs."

By @tux3 - 3 months
Remains to be seen how the microcode patch affects performance, and how these CPUs that have been affected by over-voltage to the point of instability will have aged in 6 months, or a few years from now.

More voltage generally improves stability, because there is more slack to close timing. Instability with high voltage suggests dangerous levels. A software patch can lower the voltage from this point on, but it can't take back any accumulated fatigue.

By @tpurves - 3 months
I think it's telling that they are delaying the microcode patch until after all the reviewers publish their Zen5 reviews and the comparisons of those chips against current Raptorlake performance.
By @userbinator - 3 months
Reminds me of Sudden Northwood Death Syndrome, 2002.

Looks like history may be repeating itself, or at least rhyming somewhat.

Back then, CPUs ran on fixed voltages and frequencies and only overclockers discovered the limits. Even then, it was rare to find reports of CPUs killed via overvolting, unless it was to an extreme extent --- thermal throttling, instability, and shutdown (THERMTRIP) seemed to occur before actual damage, preventing the latter from happening.

Now, with CPU manufacturers attempting to squeeze all the performance they can, they are essentially doing this overclocking/overvolting automatically and dynamically in firmware (microcode), and it's not surprising that some bug or (deliberate?) ignorance that overlooked reliability may have pushed things too far. Intel may have been more conservative with the absolute maximum voltages until recently, and of course small process sizes with higher potential for electromigration are a source of increased fragility.

Also anecdotal, but I have an 8th-gen mobile CPU that has been running hard against the thermal limits (100C) 24/7 for over 5 years (stock voltage, but with power limits all unlocked), and it is still 100% stable. This and other stories of CPUs in use for many years with clogged or even detached heatsinks seem to contribute to the evidence that high voltage is what kills CPUs, and neither heat nor frequency.

Edit: I just looked up the VCore maximum for the 13th/14th processors - the datasheet says 1.72V! That is far more than I expected for a 10nm process. For comparison, a 1st-gen i7 (45nm) was specified at 1.55V absolute maximum, and in the 32nm version they reduced that to 1.4V; then for the 22nm version it went up slightly to 1.52V.

By @magicalhippo - 3 months
There was recently[1] some talk about how the 13th/14th gen mobile chips also had similar issues, though Intel insisted it's something else.

Will be interesting to see how that pans out.

[1]: https://news.ycombinator.com/item?id=41026123

By @TazeTSchnitzel - 3 months
After watching https://youtube.com/watch?v=gTeubeCIwRw and some related content, I personally don't believe it's an issue fixable with microcode. I guess we'll see.
By @wnevets - 3 months
Are the CPUs that received elevated operating voltage permanently damaged?
By @Covzire - 3 months
Just want to say, I'm incredibly happy with my 7800X3D. It runs ~70C max like Intel chips used to and with a $35 air cooler and it's on average the fastest chip for gaming workloads right now.
By @NBJack - 3 months
I was concerned this would happen to them, given how much power was being pushed through their chips to keep them competitive. I get the impression their innovation has either truly slowed down, or AMD thought enough 'moves' ahead with their tech/marketing/patents to paint them into a corner.

I don't think Intel is done though, at least not yet.

By @brynet - 3 months
Curious why Intel announced this on their community forums, rather than somewhere more official.
By @christkv - 3 months
The amount of current their chips pull on full boost is pretty crazy. It would definitively not surprise me if some could get damaged by extensive boosting.
By @cdchn - 3 months
I built a system last fall with an i9-13900K and have been having the weirdest crashing problems with certain games that I never had problems with before. NEVER been able to track it down, no thermal issues, no overclocking, all updated drivers and BIOS. Maybe this is finally the answer I've been looking for.
By @uticus - 3 months
Dumb question: let’s say I am in charge of procurement for a significant amount of machines, do I not have the option of ordering machines from three generations back? Are older (proven reliable) processors just not available because they’re no longer made, like my 1989 Camry?
By @firebaze - 3 months
Nice that Intel acknowledges there are problems with that CPU generation. If I read this right, the CPUs have been supplied with a too-high voltage across the board, with some tolerating the higher voltages for longer, others not so much.

Curious to see how this develops in terms of fixing defective silicon.

By @nubinetwork - 3 months
They already tried bios updates when they pushed out the "intel defaults" a couple months ago...
By @PedroBatista - 3 months
Good for Intel to finally "figure it out" but I'm not 100% sure microcode is 100% of the problem. As in everything complex enough, the "problem" can actually be many compounded problems, MB vendors "special" tune comes to mind.

But this is already a mess very hard to clean since I feel many of these CPUs will die in an year or 2 because of these problems today but by then nobody will remember this and an RMA will be "difficult" to say the least.

By @Havoc - 3 months
> Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages.

That’s great news for intel. If that’s correct. If not that’ll be a PR bloodbath

By @salamo - 3 months
Is there any info on how to diagnose this problem? Having just put together a computer with the 14900KF, I really don't want to swap it out if not necessary.
By @ChoGGi - 3 months
Hmm, mid August is after the new Ryzens are out, I wonder how bad of a performance hit this microcode update will bring?

And will it actually fix the issue?

https://www.youtube.com/watch?v=QzHcrbT5D_Y

By @ChrisArchitect - 3 months
(updated from other post about mobile crashes)

Related:

Complaints about crashing 13th,14th Gen Intel CPUs now have data to back them up

https://news.ycombinator.com/item?id=40962736

Intel is selling defective 13-14th Gen CPUs

https://news.ycombinator.com/item?id=40946644

Intel's woes with Core i9 CPUs crashing look worse than we thought

https://news.ycombinator.com/item?id=40954500

Warframe devs report 80% of game crashes happen on Intel's Core i9 chips

https://news.ycombinator.com/item?id=40961637

By @whalesalad - 3 months
If I didn’t just recently invest in 128gb of DDR4 I’d jump ship to AMD/AM5. My 13900k has been (knock on wood) solid though - with 24/7 uptime since July 2023.
By @eigenform - 3 months
by "microcode" i assume they meant "pcode" for the PCU? (but they decided not to make that distinction here for whatever reason?)
By @Night_Thastus - 3 months
"Elevated operating voltage" my foot.

We've already seen examples of this happening on non-OC'd server-style motherboards that perfectly adhere to the intel spec. This isn't like ASUS going 'hur dur 20% more voltage' and frying chips. If that's all it was it would be obvious.

Lowering voltage may help mitigate the problem, but it sure as shit isn't the cause.

By @acrispino - 3 months
An Intel employee is posting on reddit: https://www.reddit.com/r/intel/comments/1e9mf04/intel_core_1...

A recent YouTube video by GamersNexus speculated the cause of instability might be a manufacturing issue. The employee's response follows.

Questions about manufacturing or Via Oxidation as reported by Tech outlets:

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

For the Instability issue, we are delivering a microcode patch which addresses exposure to elevated voltages which is a key element of the Instability issue. We are currently validating the microcode patch to ensure the instability issues for 13th/14th Gen are addressed

By @loufe - 3 months
Intel cannot afford to be anything but outstanding in terms of customer experience right now. They are getting assaulted on all fronts and need to do a lot to improve their image to stay competitive.
By @xyst - 3 months
Wonder what Linus has to say on this. Dude knows how to rip into crappy Intel products
By @fefe23 - 3 months
So on one hand they are saying it's voltage (i.e. something external, not their fault, bad mainboard manufacturers!).

On the other hand they are saying they will fix it in microcode. How is that even possible?

Are they saying that their CPUs are signaling the mainboards to give them too much voltage?

Can someone make sense of this? It reminds me of Steve Jobs' You Are Holding It Wrong moment.