July 2024 Update on Instability Reports on Intel Core 13th/14th Gen Desktop CPUs
Intel addressed instability in Core 13th and 14th Gen processors due to high voltage. A microcode patch is being developed for mid-August release, with users awaiting testing and seeking guidance from Intel.
Read original articleIn July 2024, Intel provided an update on instability reports concerning their Core 13th and 14th Gen desktop processors. The analysis revealed that elevated operating voltage was causing instability due to a microcode algorithm issue. Intel is addressing this by delivering a microcode patch to correct the voltage requests. They are conducting further validation to ensure stability and plan to release the patch to partners in mid-August. Intel encourages customers experiencing instability with these processors to contact their Customer Support for assistance. Users in the forum expressed anticipation for testing the update with different hardware configurations and acknowledged Intel's efforts to resolve the issue. Additionally, there were queries about guidance for current processor owners until the update is available and requests for Intel to communicate with motherboard manufacturers regarding BIOS updates.
Related
Intel's woes with Core i9 CPUs crashing look worse than we thought
Intel is facing issues with Core i9 CPUs, including 13th and 14th-gen models, leading to crashes and errors. Data centers are affected, raising concerns about stability and support costs. Intel's response is crucial for restoring trust.
Dev reports Intel's laptop CPUs are also suffering from crashing issues
Dev reports Intel laptop CPUs facing crashing issues, extending to 13th and 14th-Gen processors. Instability persists despite attempted fixes, impacting flagship Core i9 HX series. Reports suggest widespread degradation, raising concerns for users.
Gamers Nexus: Intel 13th, 14th Gen CPU Oxidation Claims [video]
A YouTube video cautions against using Intel CPUs due to stability issues, lack of transparency, fabrication defects, memory instability, and reduced core multipliers. Trust in Intel products eroded, with ongoing validation challenges. Wendell from Level1Techs highlights concerns.
Intel says 13th and 14th Gen mobile CPUs are crashing
Intel acknowledges instability in 13th and 14th Gen mobile processors, citing different causes from desktop chips. Users advised to contact manufacturers. AMD's Ryzen 9000 launch before Intel's Arrow Lake adds pressure.
July 2024 Update on Instability Reports on Intel Core Processors
Intel addresses instability reports on Core 13th and 14th Gen processors due to voltage issues. A microcode patch is set for mid-August release after validation. Users await testing and urge communication with motherboard manufacturers.
- Many doubt that the issue is solely microcode-related, suspecting deeper hardware problems.
- There are concerns about the long-term impact on CPU performance and durability due to previous over-voltage.
- Some believe Intel's delayed response and patch release are strategically timed around competitor reviews.
- Users are curious about the effectiveness of the microcode patch and whether it will necessitate recalls for unfixed CPUs.
- Questions arise about Intel's communication strategy and the transparency of their announcements.
Mostly because Intel has way too much motivation to pass it off as a microcode issue, as they can fix a microcode issue for free, by pushing out a patch. If it's an actual hardware issue, then Intel will be forced to actually recall all the faulty CPUs, which could cost them billions.
The other reason, is that it took them way too long to give details. If it's as simple as a buggy microcode requesting an out-of-spec voltage from the motherboard, they should have been able to diagnose the problem extremely quickly and fix it in just a few weeks. They would have detected the issue as soon as they put voltage logging on the motherboard's VRM. And according to some sources, Intel have apparently been shipping non-faulty CPUs for months now (since April, from memory), and those don't have an updated microcode.
This long delay and silence feels like they spent months of R&D trying to create a workaround, create a new voltage spec to provide the lowest voltage possible. Low enough to work around a hardware fault on as many units as possible, without too large of a performance regression, or creating new errors on other CPUs because of undervolting.
I suspect that this microcode update will only "fix" the crashes for some CPUs. My prediction is that in another month Intel will claim there are actually two completely independent issues, and reluctantly issue a recall for anything not fixed by the microcode.
"Unfortunately for John, the branches made a pact with Satan and quantum mechanics [...] In exchange for their last remaining bits of entropy, the branches cast evil spells on future genera- tions of processors. Those evil spells had names like “scaling- induced voltage leaks” and “increasing levels of waste heat” [...] the branches, those vanquished foes from long ago, would have the last laugh."
"John was terrified by the collapse of the parallelism bubble, and he quickly discarded his plans for a 743-core processor that was dubbed The Hydra of Destiny and whose abstract Platonic ideal was briefly the third-best chess player in Gary, Indiana. Clutching a bottle of whiskey in one hand and a shot- gun in the other, John scoured the research literature for ideas that might save his dreams of infinite scaling. He discovered several papers that described software-assisted hardware recovery. The basic idea was simple: if hardware suffers more transient failures as it gets smaller, why not allow software to detect erroneous computations and re-execute them? This idea seemed promising until John realized THAT IT WAS THE WORST IDEA EVER. Modern software barely works when the hardware is correct, so relying on software to correct hardware errors is like asking Godzilla to prevent Mega-Godzilla from terrorizing Japan. THIS DOES NOT LEAD TO RISING PROP- ERTY VALUES IN TOKYO. It’s better to stop scaling your transistors and avoid playing with monsters in the first place, instead of devising an elaborate series of monster checks- and-balances and then hoping that the monsters don’t do what monsters are always going to do because if they didn’t do those things, they’d be called dandelions or puppy hugs."
More voltage generally improves stability, because there is more slack to close timing. Instability with high voltage suggests dangerous levels. A software patch can lower the voltage from this point on, but it can't take back any accumulated fatigue.
Looks like history may be repeating itself, or at least rhyming somewhat.
Back then, CPUs ran on fixed voltages and frequencies and only overclockers discovered the limits. Even then, it was rare to find reports of CPUs killed via overvolting, unless it was to an extreme extent --- thermal throttling, instability, and shutdown (THERMTRIP) seemed to occur before actual damage, preventing the latter from happening.
Now, with CPU manufacturers attempting to squeeze all the performance they can, they are essentially doing this overclocking/overvolting automatically and dynamically in firmware (microcode), and it's not surprising that some bug or (deliberate?) ignorance that overlooked reliability may have pushed things too far. Intel may have been more conservative with the absolute maximum voltages until recently, and of course small process sizes with higher potential for electromigration are a source of increased fragility.
Also anecdotal, but I have an 8th-gen mobile CPU that has been running hard against the thermal limits (100C) 24/7 for over 5 years (stock voltage, but with power limits all unlocked), and it is still 100% stable. This and other stories of CPUs in use for many years with clogged or even detached heatsinks seem to contribute to the evidence that high voltage is what kills CPUs, and neither heat nor frequency.
Edit: I just looked up the VCore maximum for the 13th/14th processors - the datasheet says 1.72V! That is far more than I expected for a 10nm process. For comparison, a 1st-gen i7 (45nm) was specified at 1.55V absolute maximum, and in the 32nm version they reduced that to 1.4V; then for the 22nm version it went up slightly to 1.52V.
Will be interesting to see how that pans out.
I don't think Intel is done though, at least not yet.
Curious to see how this develops in terms of fixing defective silicon.
But this is already a mess very hard to clean since I feel many of these CPUs will die in an year or 2 because of these problems today but by then nobody will remember this and an RMA will be "difficult" to say the least.
That’s great news for intel. If that’s correct. If not that’ll be a PR bloodbath
And will it actually fix the issue?
Related:
Complaints about crashing 13th,14th Gen Intel CPUs now have data to back them up
https://news.ycombinator.com/item?id=40962736
Intel is selling defective 13-14th Gen CPUs
https://news.ycombinator.com/item?id=40946644
Intel's woes with Core i9 CPUs crashing look worse than we thought
https://news.ycombinator.com/item?id=40954500
Warframe devs report 80% of game crashes happen on Intel's Core i9 chips
We've already seen examples of this happening on non-OC'd server-style motherboards that perfectly adhere to the intel spec. This isn't like ASUS going 'hur dur 20% more voltage' and frying chips. If that's all it was it would be obvious.
Lowering voltage may help mitigate the problem, but it sure as shit isn't the cause.
A recent YouTube video by GamersNexus speculated the cause of instability might be a manufacturing issue. The employee's response follows.
Questions about manufacturing or Via Oxidation as reported by Tech outlets:
Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.
Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.
For the Instability issue, we are delivering a microcode patch which addresses exposure to elevated voltages which is a key element of the Instability issue. We are currently validating the microcode patch to ensure the instability issues for 13th/14th Gen are addressed
On the other hand they are saying they will fix it in microcode. How is that even possible?
Are they saying that their CPUs are signaling the mainboards to give them too much voltage?
Can someone make sense of this? It reminds me of Steve Jobs' You Are Holding It Wrong moment.
Related
Intel's woes with Core i9 CPUs crashing look worse than we thought
Intel is facing issues with Core i9 CPUs, including 13th and 14th-gen models, leading to crashes and errors. Data centers are affected, raising concerns about stability and support costs. Intel's response is crucial for restoring trust.
Dev reports Intel's laptop CPUs are also suffering from crashing issues
Dev reports Intel laptop CPUs facing crashing issues, extending to 13th and 14th-Gen processors. Instability persists despite attempted fixes, impacting flagship Core i9 HX series. Reports suggest widespread degradation, raising concerns for users.
Gamers Nexus: Intel 13th, 14th Gen CPU Oxidation Claims [video]
A YouTube video cautions against using Intel CPUs due to stability issues, lack of transparency, fabrication defects, memory instability, and reduced core multipliers. Trust in Intel products eroded, with ongoing validation challenges. Wendell from Level1Techs highlights concerns.
Intel says 13th and 14th Gen mobile CPUs are crashing
Intel acknowledges instability in 13th and 14th Gen mobile processors, citing different causes from desktop chips. Users advised to contact manufacturers. AMD's Ryzen 9000 launch before Intel's Arrow Lake adds pressure.
July 2024 Update on Instability Reports on Intel Core Processors
Intel addresses instability reports on Core 13th and 14th Gen processors due to voltage issues. A microcode patch is set for mid-August release after validation. Users await testing and urge communication with motherboard manufacturers.