Testing AMD's Bergamo: Zen 4c
AMD's Bergamo server CPU, based on Zen 4c cores, prioritizes core count over clock speed for power efficiency and density. It targets cloud providers and parallel applications, emphasizing memory performance trade-offs.
Read original articleAMD's Bergamo is a server CPU designed to increase core counts using density-focused Zen 4c cores, similar to strategies by Intel and Ampere. The Zen 4c variant sacrifices high clock speeds for better power efficiency and area usage, allowing more cores in the same silicon area. Bergamo reuses AMD's Zen 4 server platform but with Zen 4c CCDs, each containing two CCX-es with 16 MB shared L3 cache. The system aims to cater to cloud providers and parallel applications by offering higher core counts. Bergamo's memory bandwidth and latency performance are crucial, with implications for application performance. In a dual socket configuration, Bergamo can scale up to 256 cores, but accessing remote memory incurs a latency penalty. Comparisons with older server CPUs like Broadwell and Westmere highlight trade-offs in latency and bandwidth. Core-to-core latency tests reveal Bergamo's challenges with high core counts, while benchmarking against other Zen 4 CCD variants shows the impact of core count and cache capacity on memory bandwidth. Overall, Bergamo's design focuses on maximizing core counts within a given area while balancing power efficiency and performance considerations.
Related
Unisoc and Xiaomi's 4nm Chips Said to Challenge Qualcomm and MediaTek
UNISOC and Xiaomi collaborate on 4nm chips challenging Qualcomm and MediaTek. UNISOC's chip features X1 big core + A78 middle core + A55 small core with Mali G715 MC7 GPU, offering competitive performance and lower power consumption. Xiaomi's Xuanjie chip includes X3 big core + A715 middle core + A510 small core with IMG CXT 48-1536 GPU, potentially integrating a MediaTek baseband. Xiaomi plans a separate mid-range phone line with Xuanjie chips, aiming to strengthen its market presence. The successful development of these 4nm chips by UNISOC and Xiaomi marks progress in domestically produced mobile chips, enhancing competitiveness.
Arm64EC – Build and port apps for native performance on Arm
Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.
Vulnerability in Popular PC and Server Firmware
Eclypsium found a critical vulnerability (CVE-2024-0762) in Intel Core processors' Phoenix SecureCore UEFI firmware, potentially enabling privilege escalation and persistent attacks. Lenovo issued BIOS updates, emphasizing the significance of supply chain security.
Finnish startup says it can speed up any CPU by 100x
A Finnish startup, Flow Computing, introduces the Parallel Processing Unit (PPU) chip promising 100x CPU performance boost for AI and autonomous vehicles. Despite skepticism, CEO Timo Valtonen is optimistic about partnerships and industry adoption.
Gren 0.4: New Foundations
Gren 0.4 updates its functional language with enhanced core packages, a new compiler, revamped FileSystem API, improved functions, and a community shift to Discord. These updates aim to boost usability and community engagement.
As I said before, I do believe that this is the future of CPUs core. [1] With RAM latency not really having kept pace with CPUs have more performant cores really seems like a waste. In a Cloud setting where you always have some work to do it seems like simpler cores but more of them is really the answer. It's in the environment that the weight of x86's legacy will catch up with us and we'll need to get rid of all the waste transistors decoding cruft.
It should be noted that the successor of Bergamo, Turin dense, which is expected by the end of the year, will have 12 compute chiplets, for a total of 192 Zen 5c cores, bringing thus both more cores and faster cores.
With the right preprocessing, it should be possible to essentially stream from NIC straight to CPU, process some stuff, and output it straight to the NIC again without ever touching RAM - although I doubt current DMA hardware allows this. It'd require quite a bit of re-engineering of the on-wire protocol to remove the need for any nontrivially-sized buffers, but I reckon it isn't impossible.
Would be interesting to see a compute bound perfectly scaling workload and compare it in terms of absolute performance and performance per watt between Bergamo and Genoa.
Related
Unisoc and Xiaomi's 4nm Chips Said to Challenge Qualcomm and MediaTek
UNISOC and Xiaomi collaborate on 4nm chips challenging Qualcomm and MediaTek. UNISOC's chip features X1 big core + A78 middle core + A55 small core with Mali G715 MC7 GPU, offering competitive performance and lower power consumption. Xiaomi's Xuanjie chip includes X3 big core + A715 middle core + A510 small core with IMG CXT 48-1536 GPU, potentially integrating a MediaTek baseband. Xiaomi plans a separate mid-range phone line with Xuanjie chips, aiming to strengthen its market presence. The successful development of these 4nm chips by UNISOC and Xiaomi marks progress in domestically produced mobile chips, enhancing competitiveness.
Arm64EC – Build and port apps for native performance on Arm
Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.
Vulnerability in Popular PC and Server Firmware
Eclypsium found a critical vulnerability (CVE-2024-0762) in Intel Core processors' Phoenix SecureCore UEFI firmware, potentially enabling privilege escalation and persistent attacks. Lenovo issued BIOS updates, emphasizing the significance of supply chain security.
Finnish startup says it can speed up any CPU by 100x
A Finnish startup, Flow Computing, introduces the Parallel Processing Unit (PPU) chip promising 100x CPU performance boost for AI and autonomous vehicles. Despite skepticism, CEO Timo Valtonen is optimistic about partnerships and industry adoption.
Gren 0.4: New Foundations
Gren 0.4 updates its functional language with enhanced core packages, a new compiler, revamped FileSystem API, improved functions, and a community shift to Discord. These updates aim to boost usability and community engagement.