Monday, November 7th 2022

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

An alleged leaked company slide details AMD's upcoming 5 nm "Navi 31" GPU powering the next-generation Radeon RX 7900 XTX and RX 7900 XT graphics cards. The slide details the "Navi 31" MCM, with its central graphics compute die (GCD) chiplet that's built on the 5 nm EUV silicon fabrication process, surrounded by six memory cache dies (MCDs), each built on the 6 nm process. The GCD interfaces with the system over a PCI-Express 4.0 x16 host interface. It features the latest-generation multimedia engine with dual-stream encoders; and the new Radiance display engine with DisplayPort 2.1 and HDMI 2.1a support. Custom interconnects tie it with the six MCDs.

Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU's 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. "Navi 31" hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on "Navi 21," but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.
The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of "Navi 31" has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the "Navi 21." The slide doesn't quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU's raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.
Source: VideoCardz
Add your own comment

79 Comments on AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

#1
Jism
Being PCI-E 4.0 only shows consumer graphics cards really dont need 5.0 or 6.0 for that matter.
Posted on Reply
#2
Space Lynx
Astronaut
Since gen4 never got saturated anyway I mean this doesn't really matter either way, thats my understanding. So this really isnt news?
Posted on Reply
#3
taka
This doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

Posted on Reply
#4
Crackong
CallandorWoTSince gen4 never got saturated anyway I mean this doesn't really matter either way, thats my understanding. So this really isnt news?
+1
Don't really think this deserves a news headline in the front page.
Posted on Reply
#5
AsRock
TPU addict
I thought everyone knew this already, this is like Zen1, 8900 will be much better maybe with double cache (196 :eek:).

And isn't the shader running slower this time around ?, (2.3). Be interesting what AIB's will end up doing with it.
Posted on Reply
#6
JAB Creations
takaThis doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

Crammed, I'm sure that word is used a lot in cutting edge technology conversations.

Posted on Reply
#7
mx500torid
The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.

I thought the 7900 XTX used 24 GB of DDR6. Slide shows up to 24. Think this might need edited.
Posted on Reply
#8
dragontamer5788
I thought they stopped calling it a "Dual Compute Unit" but now supposed to call them "Workgroup Processors" (WGP) ??

Ah well, whatever. I know what they mean. I just... thought they changed the name already.
Posted on Reply
#9
Calenhad
mx500toridThe RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.

I thought the 7900 XTX used 24 GB of DDR6. Slide shows up to 24. Think this might need edited.
You are mixing your Gbps and GB. Speed and capacity
Posted on Reply
#10
HalfAHertz
The other hidden tidbit is that the boost clock will be 2.5 Ghz
Posted on Reply
#11
Dirt Chip
let AMD undercut NV by 50% $ for 15% performance in the top, everyone (both 'camps') will be very happy with it.
The only relevant sku for Ada should be 4090/ti. Push the 'un-luche' for 4080 16GB as well, the confusion is still here.
May all high to low tier dominate by Radeon for this gen.
I very much hope AMD will not fallow NV on the race to the top Wattage consumption with 7950XTX with 4/5*8PIN.
Focus on efficiency and cost and you will see your market share grows like never before.
And for heaven's sake, give already us a proper CUDA competitor for video encode and AI computation.
Posted on Reply
#12
Wirko
JismBeing PCI-E 4.0 only shows consumer graphics cards really dont need 5.0 or 6.0 for that matter.
Yes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
Posted on Reply
#13
dgianstefani
TPU Proofreader
The "advanced chiplets design" would only be "disruptive" vs Monolithic if AMD either
1. Passed on the cost savings to consumers (prices are same for xtx, and 7900xt is worse cu count % wise to top sku than 6800xt was to 6900xt)
2. Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

AMD is neither taking advantage of their cost savings and going the value route, or competing vs their competition's best. They're releasing a product months later, with zero performance advantages vs the competition? Potential 4080ti will still be faster and they've just given nvidia free reign (once again) to charge whatever they want for flagships since there's zero competition.

7950xtx could have two 7900xtx dies, but I doubt it
Posted on Reply
#14
DemonicRyzen666
takaThis doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

From What I can Read the lack of TMU's is the main problem with this desgin.
the RX 6950 has 320 Texture units along with 128 R.O.P's, which gives it 739.2 GTexel/s at 2100mhz-2300mhz
While the RX 7900 XTX has 384 Texture units along with 192 R.O.P's
The Texture fill rate should be 887.04GTexel/s, not 961.9 GTexel/s There is no way this GPU is a that.


^ this part should be bigger. They double Instructions in preclock.
I feel like the should have just double the RT core count in the compute part instead of increasing Shader count by 2.7 times.
Posted on Reply
#15
TheoneandonlyMrK
WirkoYes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
I get you but a 800/1000£ GPU should never be in anything but the top x16 on a motherboard for typically usage.

And it is rare, no unheard of for a consumer motherboard to have no x16 slot.
Posted on Reply
#16
THU31
WirkoYes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
What cases would that be, other than Intel boards when using a Gen5 NVMe drive, which would probably affect a tiny percentage of users?

But they definitely missed out on the marketing possibilities. Sales of Zen 4 are already terrible. They lined up the naming schemes for Zen 4 and RDNA3, but RDNA3 does not support one of Zen 4's main features, being PCI-E 5.0. So why should anyone buy Zen 4 right now, when they can just wait for lower prices and X3D CPUs? They really shot themselves in the foot.
Posted on Reply
#17
GunShot
After the 7000 series official announcement, every passing hour, TRUE facts destroys every AMD's nuts high-praised gibberish right into the toliet, even AMD had to come out and refute dumb nuts metrics now against the 4090. :nutkick:
Posted on Reply
#18
TheLostSwede
News Editor
WirkoYes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
Mostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.
TheoneandonlyMrKI get you but a 800/1000£ GPU should never be in anything but the top x16 on a motherboard for typically usage.

And it is rare, no unheard of for a consumer motherboard to have no x16 slot.
Not at all, most Z790 and many Z690 boards split the x16 slot with M.2 slots, just so the boards can be marketed as having PCIe 5.0 M.2 support, just as on AMD boards.
Posted on Reply
#19
Leiesoldat
lazy gamer & woodworker
Dirt Chiplet AMD undercut NV by 50% $ for 15% performance in the top, everyone (both 'camps') will be very happy with it.
The only relevant sku for Ada should be 4090/ti. Push the 'un-luche' for 4080 16GB as well, the confusion is still here.
May all high to low tier dominate by Radeon for this gen.
I very much hope AMD will not fallow NV on the race to the top Wattage consumption with 7950XTX with 4/5*8PIN.
Focus on efficiency and cost and you will see your market share grows like never before.
And for heaven's sake, give already us a proper CUDA competitor for video encode and AI computation.
AMD already has a solution to CUDA called HIP. Here is a FAQ from AMD's ROCm documentation:


[URL='https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-FAQ.html#id7']Is HIP a drop-in replacement for CUDA?[/URL]

No. HIP provides porting tools which do most of the work to convert CUDA code into portable C++ code that uses the HIP APIs. Most developers will port their code from CUDA to HIP and then maintain the HIP version. HIP code provides the same performance as native CUDA code, plus the benefits of running on AMD platforms.

source: rocmdocs.amd.com/en/latest/Programming_Guides/HIP-FAQ.html
Posted on Reply
#20
TheoneandonlyMrK
TheLostSwedeMostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.


Not at all, most Z790 and many Z690 boards split the x16 slot with M.2 slots, just so the boards can be marketed as having PCIe 5.0 M.2 support, just as on AMD boards.
This modern trend needs to die or I'm not updating my pc, simple.

Though to be fair x8 doesn't loose that much performance.

Also that's not right, you get 1x16 ,1x nvme and the southbridge unless you want two nvme attached direct to CPU no?!.
Posted on Reply
#21
bug
TheLostSwedeMostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.
You could have 2-3 NVMe drives in the same system. That said, you'd also need a motherboard that only wires 4-8 PCIe5 lanes to the GPU slot, freeing the rest for your SSDs.

And really, we've had this discussion since PCIe2: every time a new PCIe revision comes out, it offers more bandwidth than the GPUs need. It's ok to not use the latest and greatest.

Edit: Also, PCIe5 speeds are useless on SSDs, too. It'll give you (maybe) a higher number for large, sequential operations (which you rarely do), but keep the same numbers for 4k random reads (which you always do). So really, nothing to see here.
Posted on Reply
#22
Nopa
Completely seperate 1* 5.0 x16 for GPU, 2* 5.0 x4 for NVMe would be the best solution, whether it's AMD's or Intel's boards.
Posted on Reply
#23
THU31
NopaCompletely seperate 1* 5.0 x16 for GPU, 2* 5.0 x4 for NVMe would be the best solution, whether it's AMD's or Intel's boards.
That is exactly how it is on Zen 4. The CPUs have 28 Gen5 lanes. 16 for GPU, 8 for two NVMe slots and 4 for the chipset link (which actually runs at Gen4 speeds for some reason, maybe future chipsets will have a Gen5 link which will work with current CPUs).
Two NVMe slots are connected to the CPU, all additional slots are connected to the chipset (and limited by the Gen4 CPU-chipset link).

Alder/Raptor Lake only have 16 Gen5 lanes. 4 lanes for NVMe are only Gen4. I expect Meteor Lake will only have Gen5 lanes in the CPU.
Posted on Reply
#24
Gica
THU31What cases would that be, other than Intel boards when using a Gen5 NVMe drive, which would probably affect a tiny percentage of users?

But they definitely missed out on the marketing possibilities. Sales of Zen 4 are already terrible. They lined up the naming schemes for Zen 4 and RDNA3, but RDNA3 does not support one of Zen 4's main features, being PCI-E 5.0. So why should anyone buy Zen 4 right now, when they can just wait for lower prices and X3D CPUs? They really shot themselves in the foot.
Correct! You can pay $1000 for an AM5 platform or you can keep the AM4 system and spend $350 on a 5800X3D. You have a top system for gaming (which surpasses the 7600X/7700X) and a decent one for other activities.
Posted on Reply
#25
TheLostSwede
News Editor
TheoneandonlyMrKThis modern trend needs to die or I'm not updating my pc, simple.

Though to be fair x8 doesn't loose that much performance.

Also that's not right, you get 1x16 ,1x nvme and the southbridge unless you want two nvme attached direct to CPU no?!.
Intel only does x16 PCIe 5.0 lanes for the "GPU", but board makers are splitting it into x8 for the "GPU" and x4 for one M.2 slot on a lot of boards, with four lanes being lost. Some higher-end boards get two PCIe 5.0 M.2 slots.

As AMD has two times x4 "spare" PCIe lanes (plus four to the chipset), it's not an issue on most AM5 boards, but there are some odd boards still split the PCIe 5.0 lanes to the "GPU" and use them for M.2 slots.
bugYou could have 2-3 NVMe drives in the same system. That said, you'd also need a motherboard that only wires 4-8 PCIe5 lanes to the GPU slot, freeing the rest for your SSDs.

And really, we've had this discussion since PCIe2: every time a new PCIe revision comes out, it offers more bandwidth than the GPUs need. It's ok to not use the latest and greatest.
Yes, x8 PCIe 5.0 lanes are in theory x16 PCIe 4.0 lanes, the issue is that a x16 PCIe 4.0 card, ends up as a PCIe 4.0 x8 card in a PCIe 5.0 x8 slot.
The PCIe spec doesn't allow for eight lanes to magically end up being 16 or 32 lanes of a lower speed grade, it requires an additional, costly chip.

As for the slot mix, with bifurcation it's not a problem to split the lanes on the fly, so as long as you don't use the M.2 slots, your x16 slot remains x16.
NopaCompletely seperate 1* 5.0 x16 for GPU, 2* 5.0 x4 for NVMe would be the best solution, whether it's AMD's or Intel's boards.
AMD allows that, Intel does not as yet.
Posted on Reply
Add your own comment
May 8th, 2024 19:21 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts