September 27th, 2024

Lion Cove: Intel's P-Core Roars

Intel's Lion Cove mobile CPU architecture improves performance and energy efficiency with a chiplet design, enhanced latency, a new mid-level cache, and a split scheduler, but L3 bandwidth lags behind AMD.

Read original article

Intel's latest mobile CPU architecture, Lion Cove, is designed to enhance performance and energy efficiency in its P-Core lineup, particularly within the Lunar Lake platform. This architecture represents a significant evolution from the previous generation, Meteor Lake, by adopting a chiplet design that consolidates compute functions onto a single tile while utilizing a separate tile for low-speed I/O. Lion Cove aims to maximize per-thread performance, crucial for applications that do not effectively utilize multiple cores. The architecture features a ring bus interconnect, improved L3 latency, and enhanced DRAM latency due to the integration of the memory controller on the same tile as the CPU cores. The new design includes a mid-level cache, referred to as L1.5, which helps reduce latency for L1D misses. Lion Cove also introduces a split scheduler for integer and floating-point operations, increasing execution capacity and efficiency. Despite improvements, L3 bandwidth remains a relative weakness compared to competitors like AMD's Zen 5 architecture. Overall, Lion Cove's advancements position Intel to better compete in the mobile CPU market against AMD and others.

- Lion Cove architecture enhances performance and energy efficiency for Intel's mobile CPUs.

- The design consolidates compute functions onto a single tile, improving latency and efficiency.

- A new mid-level cache (L1.5) reduces latency for L1D misses.

- The architecture features a split scheduler for better execution capacity.

- L3 bandwidth remains a challenge compared to AMD's offerings.

Intel's big plan to take on Qualcomm; promises that x86 is here to stay

Intel has launched its Lunar Lake chips, the Core Ultra 200V, enhancing power efficiency and performance in mobile computing, with models available for preorder and general availability on September 24, 2024.

Intel announces first batch of second-gen "Lunar Lake" Core Ultra laptop CPUs

Intel's next-generation Core Ultra processors, launching on September 24, 2024, promise enhanced battery life, power efficiency, and a neural processing engine for AI tasks, addressing compatibility issues with Arm-based systems.

Intel Core Ultra 200V Series Lunar Lake Launched

Intel launched its Core Ultra 200V Series processors, Lunar Lake, featuring on-package memory, a 4+4 core configuration, improved AI performance, and enhanced gaming capabilities, while facing competition from Qualcomm and AMD.

An Interview with Intel's Arik Gihon about Lunar Lake at Hot Chips 2024

Arik Gihon discussed Intel's Lunar Lake architecture, highlighting the exclusion of SMT for efficiency, relocation of E-cores, improved latency with a new L1 cache, and the adoption of PCIe 5 for bandwidth.

Intel's Redwood Cove: Baby Steps Are Still Steps

Intel's Redwood Cove architecture offers modest upgrades over Raptor Cove, featuring improved branch prediction, a doubled L1 instruction cache, increased micro-op queue size, and reduced floating-point multiplication latency.

8 comments

By @kristianp - 7 months

About 94.9 GB/s DRAM bandwidth for the Core Ultra 7 258V they measured. Aren't Intel going to respond to the 200GB/s bandwidth of the M1 Pro introduced 3 years ago? Not to mention 400GB/s of Max and 800GB/s of the Ultra?

Most of the bandwidth comes from cache hits, but for those rare workloads larger than the caches, Apples products may be 2-8x faster?

By @perryh2 - 7 months

It looks awesome. I am definitely going to purchase a 14" Lunar Lake laptop from either Asus (Zenbook S14) or Lenovo (Yoga Slim). I really like my 14" MBP form factor and these look like they would be great for running Linux.

By @RicoElectrico - 7 months

> A plain memory latency test sees about 131.4 ns of DRAM latency. Creating some artificial bandwidth load drops latency to 112.4 ns.

Can someone put this in context? The values seem order of magnitude higher than here: https://www.anandtech.com/show/16143/insights-into-ddr5-subt...

By @adrian_b - 7 months

I completely agree with the author that renaming the L1 cache memory as L0 and introducing a new L1 cache, as done by Intel is a completely misleading terminology.

The correct solution is that from the parent article, to continue to call the L1 cache memory as the L1 cache memory, because there is no important difference between it and the L1 cache memories of the previous CPUs, and to call the new cache memory that has been inserted between the L1 and L2 cache memories as the L1.5 cache memory.

Perhaps Intel did this to give the very wrong impression that the new CPUs have a bigger L1 cache memory than the old CPUs. To believe this would be incorrect, because the so called new L1 cache has a much lower throughput and a worse latency than a true L1 cache memory of any other CPU.

The new L1.5 is not a replacement for an L1 cache, but it functions as a part of the L2 cache memory, with identical throughput as the L2 cache, but with a lower latency. As explained in the article, this has been necessary to allow Intel to expand the L2 cache to 2.5 MB in Lunar Lake and to 3 MB in Arrow Lake S (desktop CPU), in comparison with AMD, which has an only 1 MB L2 cache (but a bigger L3 cache).

According to rumors, while the top AMD desktop CPUs without stacked cache memory have an 80 MB L2+L3 cache (16 MB L2 + 64 MB L3), the top Intel model 285K might have 78 MB of cache, i.e. about the same amount, but with a different distribution on levels: 2 MB L1.5 + 40 MB L2 + 36 MB L3. Nevertheless, until now there is no official information from Intel about Arrow Lake S, whose launch is expected in a month from now, so the amount of L3 cache is not certain, only the amounts of L2 and L1.5 are known from earlier Intel presentations.

Lunar Lake is an excellent design for all applications where adequate cooling is impossible, i.e. thin and light notebooks and tablets or fanless small computers.

Nevertheless, Intel could not abstain from not using unfair marketing tactics. Almost all the benchmarks presented by Intel at the launch of Lunar Lake have been based on the top model 288V. Both top models 288V and 268V are likely to be unobtainium for most computer models, while at the few manufacturers that will offer this option they will be extremely overpriced.

Most available and affordable computers with Lunar Lake will not offer any better CPU than 258V, which is the one tested in the parent article. 258V has only 4.8 GHz/2.2 GHz turbo/base clock frequencies, vs. 5.1 GHz/3.3 GHz of the 288V used in the Intel benchmarks and in many other online benchmarks. So the actual experience of most Lunar Lake users will not match most published benchmarks, even if it will be good enough in comparison with any competitors in the same low-power market segment.

By @AzzyHN - 7 months

We'll have to see how this compared to Zen 5 once 24H2 drops.

And once more than like three Zen 5 laptops come out.

By @AStonesThrow - 7 months

I apologize in advance for my possibly off-topic linguistics-nerd pun:

Q: What do you call Windows with its UI translated to Hebrew? A: The L10N of Judah

Lion Cove: Intel's P-Core Roars

Related

Intel's big plan to take on Qualcomm; promises that x86 is here to stay

Intel announces first batch of second-gen "Lunar Lake" Core Ultra laptop CPUs

Intel Core Ultra 200V Series Lunar Lake Launched

An Interview with Intel's Arik Gihon about Lunar Lake at Hot Chips 2024

Intel's Redwood Cove: Baby Steps Are Still Steps

Related

Intel's big plan to take on Qualcomm; promises that x86 is here to stay

Intel announces first batch of second-gen "Lunar Lake" Core Ultra laptop CPUs

Intel Core Ultra 200V Series Lunar Lake Launched

An Interview with Intel's Arik Gihon about Lunar Lake at Hot Chips 2024

Intel's Redwood Cove: Baby Steps Are Still Steps