AMD RDNA 2 GPUs Have Much Better Memory Latency Versus NVIDIA's Ampere GPU Architecture

The memory latency general performance of AMD’s RDNA 2 & NVIDIA’s Ampere GPU architectures has been examined by Chips and Cheese. The tech outlet decided to check out the GPU memory latency overall performance of the most up-to-date GPU architectures from group crimson and workforce environmentally friendly & identified out some attention-grabbing final results.

On the CPU facet, measuring cache and latency functionality has turn out to be a critical pointer with the at any time-expanding use of multi-chiplet dies and a number of IO chips onboard the similar die and in modern situations, off-die also (AMD Zen chiplets). GPUs are also composed of several cache hierarchies that fill in the gaps concerning compute and memory effectiveness and the source used OpenCL-dependent pointer chasing benchmarks to evaluate cache and memory latency functionality on present-gen of GPUs this kind of as the NVIDIA Ampere and AMD RDNA two architectures.

NVIDIA Ampere GPU vs AMD RDNA two GPU cache and latency general performance calculated. (Image Credits: Chips and Cheese)

In the benchmarks, the AMD Radeon RX 6800 XT (RDNA two GPU) & the NVIDIA GeForce RTX 3090 (Ampere GPU) had been positioned from each and every other. The cache and memory benchmark reveals that AMD’s RDNA two architecture fared much much better than NVIDIA’s Ampere GPU, providing lessen latency inspite of getting to check out two much more concentrations of cache on the way to the memory. The use of Infinity cache only adds 20ns about L2 hit and is even now quicker than NVIDIA’s Ampere.

The motive mentioned is that the NVIDIA Ampere-primarily based GA102 GPU is basically a substantially larger sized GPU and although it uses a additional traditional GPU memory subsystem with only two cache stages, it has to take a ton of cycles and success in in excess of 100ns latency (L1 to L2). RDNA 2 on the other hand has a latency of just 66ns. Do take note that the AMD Navi 21 GPU is a great deal smaller sized & options a four MB L2 cache whilst the NVIDIA GA102 GPU attributes a six MB L2 cache for the total chip. The NVIDIA A100 Ampere GPU for HPC features a significant 40 MB L2 cache.

Subsequent is a be aware on the effectiveness from Chips and Cheese:

RDNA 2’s cache is speedy and there is a lot of it. As opposed to Ampere, latency is reduced at all stages. Infinity Cache only adds about 20 ns above a L2 strike and has decreased latency than Ampere’s L2. Amazingly, RDNA 2’s VRAM latency is about the very same as Ampere’s, even however RDNA two is checking two a lot more concentrations of cache on the way to memory.

In distinction, Nvidia sticks with a much more conventional GPU memory subsystem with only two levels of cache and superior L2 latency. Heading from Ampere’s SM-non-public L1 to L2 normally takes above 100 ns. RDNA’s L2 is ~66 ns away from L0, even with a L1 cache concerning them. Finding around GA102’s enormous die appears to take a large amount of cycles.

This could describe AMD’s superb efficiency at decreased resolutions. RDNA 2’s lower latency L2 and L3 caches might give it an edge with scaled-down workloads, where occupancy is far too low to hide latency. Nvidia’s Ampere chips in comparison need more parallelism to glow.

through Chips and Cheese

As opposed to older Pascal and Maxwell chips, the Ampere architecture has led to very improved latency speeds on much more substantial GPUs. AMD on the other hand has shown some remarkable gains vs more mature GCN and VLIW architecture-dependent chips. These quantities are certainly going to be appealing for comparison when the new spherical of chiplet based mostly GPUs hits the gaming phase in the coming decades.

