AMD’s Ryzen CPUs have been on the market for almost three years now, offering nearly twice the performance compared to rival Intel CPUs at the same price points. This has forced Intel to take some drastic measures to keep up with a resurgent AMD: Bumping up the core counts across the board, the introduction of additional HEDT chips and price drops for existing parts are some of the most notable changes.
However, if you look at the company’s performance over the past decade, it has been dismal, to say the least. Ever since Intel’s Core architecture landed, AMD has essentially been playing catch up (and failing badly).
How did AMD go from being the “manufacturer of cheap, budget CPUs” to
The Older Bulldozer and FX Processors
Let’s clear up a few things. Zen is good but it wouldn’t have been such a step up for AMD if the older Bulldozer design wasn’t so flawed at a microscopic level. After the third-gen K10 architecture, AMD acquired ATI Technologies which drained the company funds, preventing it from investing in a new CPU process.
Instead of trying to improve the existing architecture, the team decided to invest in a narrow, low-IPC, high-clock design, known as Bulldozer. While on paper, the core counts seemed promising, but the single-threaded performance was adversely affected. The result: AMD ended up losing its competitive edge and the god-awful FX processors were born.
You needed high core counts and higher operating clocks to make this work which isn’t something AMD was able to pull off, and the rest is history. To give a clearer picture of how disadvantaged the Bulldozer chips were compared to their predecessors, here’s an example: To offer performance in line with the older Phenom II processors, Bulldozer needed to have a 50% higher operating speed (on an average). This, of course, didn’t happen, and instead resulted in power-hungry CPUs that ran hot and unstable.
Have a look at the above IPC chart. Instead of going up, the IPC actually fell with Bulldozer and it took three upgrade cycles to bring it back on par with K10. The company started to move in the right direction with Steamroller and Excavator but hit a roadblock soon after. The design limitations of the Bulldozer architecture prevented the engineers from making further improvements without overhauling the layout.
This finally resulted in the brand new Zen architecture that ditched all the bottlenecks of Bulldozer and its successors, and here we are today, with the 3rd Gen Ryzen lineup leveraging the Zen 2 microarchitecture based on the 7nm node.
Read more here:
AMD Ryzen vs FX Processors: What Makes the Zen Microarchitecture so Much Better than Bulldozer?
AMD Ryzen 3000 and Epyc Rome: The Chiplet Approach (MCM)
One of the main highlights of AMD’s Ryzen and Epyc processors is the chiplet or MCM design. Multiple heterogeneous dies are placed on the substrate and connected via a high-speed Infinity Fiber Interconnect. This improves yields and reduces manufacturing costs significantly.
Traditionally, monolithic dies have been used for consumer and server CPUs alike. With these dies, as you increase the core count and size of the wafer, wastage and defects increase substantially. With a high-end Xeon part, Intel has to discard more than half of the produced chips due to defects. This is why monolithic design (while it does have its advantages) isn’t very cost-effective.
In a monolithic design (like Intel’s), the yields and wastage increase substantially as you increase the core count and build larger chips. With AMD’s design, you are basically making eight-core dies or chiplets and connecting them together.
As the number of individual dies increases, the costs increase linearly but the wastage doesn’t increase exponentially. The overall yields improve by a significant margin, thereby reducing the manufacturing costs.
However, there’s a catch. The Infinity Fabric induces a latency delay when communicating with the cores and the cache on the other chiplets. This has a drastic impact on applications that are latency-sensitive such as gaming, one of the reasons Intel chips tend to be better at it than rival Ryzen parts.
AMD managed to deal with the latency drop by doubling the L3 cache (calling it GameCache). By storing more data on the chip cache, the cache miss rate is reduced, thereby improving the latency by up to 33ns.
According to AMD’s official figures, this improved the 1080p gaming performance by 21% on average. Zen 2’s new cache hierarchy doubles the L1 load/store bandwidth compared to Zen and many latency-sensitive applications also get a healthy boost.
Central I/O Die and CCDs
The Zen 2 design includes one chiplet for the Ryzen 5 and 7 processors while two for the Ryzen 9 chips. However, all the chips use the same 12nm I/O die connected to each CCD. It acts as the central hub for all off-chip communications including memory, PCIe lanes, etc.
All modern processors include a branch predictor. It essentially tries to “guess” the outcome of a particular operation. AMD’s Ryzen 3000 chips feature the TAGE branch predictor.
For L1 based predictions, the Hashed Perceptron branch predictor is used while TAGE is used for L2. The main highlight is that the latter uses additional tagging for expand branch histories. This is mainly because the HP predictor is used for short prefetches while TAGE is used for longer ones.
2x Micro-Op Cache
AMD has also doubled the micro-op cache size from 2K to 4K in Zen 2. Micro-operations are decoded functions pending execution in the CPU pipeline and are stored in the micro-op cache for recycling. Intel’s Coffee Lake CPUs have a much smaller cache size of 1.5K while the newer Sunny Cove core isn’t that impressive either with 2.25K of micro-op cache.
2x Floating Point Bandwidth
The Zen 2 chips boast a superior FP performance. The Floating-Point and Load Store bandwidth has been doubled from 128-bit to 256-bit. This means that AVX2 operations can now be processed without breaking them down into micro-ops. There are four queues in the floating-point unit, two add and two multiply. Multiply or mul is essentially repeated addition. The Zen 2 core boats a faster mul execution speed of 3 cycles (down from 4).
Power Efficiency, Faster Boosts Security Mitigations
The Windows 10 May 2019 update brought many changes to the Ryzen boosting behavior. Compared to earlier, the peak boost speeds can now be sustained much faster (up to 30x). However, it’ll be harder to stay at these frequencies if you are on an air-based cooler. The boost clock behavior of the Zen 2 chips is very similar to NVIDIA’s GPU Boost. They are quite sensitive to temperature and power.
The cores not only boost in an instant but they go back to their idle states in less than an instant too. When your PC is idle, the CPU will scale down to a low-power state, drawing as low as 10-20W of energy. Paired with the efficient 7nm node, this has made AMD’s Ryzen 3000 CPUs much more power efficient compared to Intel’s 9th Gen Coffee Lake parts.
This year started with the reveal of the x86 architecture vulnerabilities. Intel’s products were badly affected while AMD got away with a few scratches and bruises. For team blue, most of the issues have been dealt with via software and firmware updates (at the cost of performance) but AMD’s new Ryzen 3000 arsenal offers protection from all known vulnerabilities at a hardware level.
Conclusion: MCM Design + 7nm Node= Win
One of the main takeaways from the Ryzen 3000 launch is the adoption of the chiplet or MCM (Multi-Chip Module) approach. It improves scalability, reduces costs and offers more cores to the average consumer without increasing the price-tag. Moore’s Law may have “slowed-down” but looking at AMD’s MCM design and packaging strategies, it’s clear that the company is looking to score big in the coming years. Intel, on the other hand, continues to release revisions of the 14nm Skylake core and reminds everyone that their “real-world” performance is superior.
The post Are AMD CPUs Better than Intel? What Makes the Ryzen 3000 Processors Faster as well as Cheaper than Competing Intel Chips? appeared first on Hardware Times.