Agner's CPU blog

Posted: **2021-01-31, 16:22:39**

I have now tested the AMD Zen 3 (Ryzen 5800) architecture.

The Zen 1 design from AMD was quite successful with substantial improvements over previous models. Zen 2 made significant improvements over Zen 1, and Zen 3 now turns out to be still faster. There are more execution units and several other improvements in Zen 3. AMD's claims about improved performance are basically confirmed by my tests. See link.

The throughput of the Zen 3 is now as high as six instructions per clock cycle. This may be six integer instructions or six floating point/vector instructions, or any mix of these. This is a record so far. It can do three memory operations per clock. The clock frequency is 3.8 GHz with boosts up to almost 5 GHz.

A serious bottleneck is a decoding rate of 4 instructions or 16 bytes per clock. To compensate for this, the Zen 3 has a micro-op cache with 4096 entries after the decoder.

The increased throughput in terms of instructions per clock may be difficult to utilize if the software has long dependency chains (where each calculation must wait for the result of the preceding one). It is now more important than ever to avoid long dependency chains.

The bottleneck in the decoder appears to be difficult to overcome. This is a consequence of the messy x86 code structure where instructions can have any length from 1 to 15 bytes, and it is complicated to determine the length of each instruction. Intel processors have the same bottleneck and the same decoding rate. The programmer must make sure the critical part of a program fits into this micro-op cache if you want to get the maximum throughput. It is important to avoid loop unrolling where possible in order to economize the use of the micro-op cache. (The Clang compiler often makes excessive loop unrolling).

The AMD Zen 3 has a higher instruction-per-clock throughput and a bigger micro-op cache than the best current Intel processors. This makes the Zen 3 the best choice for many applications. The Zen 3 does not support the AVX512 instruction set, however. Therefore, Intel processors are likely to be faster for software that can utilize the 512-bit vector instructions. AMD have focused on higher throughput where Intel have focused on larger vectors.

The Zen 2 had the surprising feature that it can mirror memory operands inside the CPU, as I have described here. The Zen 3 does not have this feature. This feature is no doubt costly in terms of hardware complexity and temporary registers. This feature is likely to be more useful in 32-bit mode than in 64-bit mode. Therefore, it makes sense to prioritize the hardware resources for other improvements.

I have made a detailed description of the Zen 3 architecture in my microarchitecture manual and my list of instruction timings (link).

Posted: **2021-02-02, 8:41:22**

is the utilization of micro-op cache in single thread scenario better as in previous generations, where single thread used only half of the cache?

Posted: **2021-02-02, 12:57:21**

@RobertS:
Two threads running the same code in the same core can both use the same entries in the µop cache in Zen 3, but they only get a throughput of 3 µops per clock each. A single thread can get a throughput or 6 µops per clock from the µop cache.

Posted: **2021-02-03, 8:20:21**

Actually, with my question I meant that for ZEN/ZEN2 you have a remark regarding micro-op cache

The processor has an extra cache for decoded instructions. The size is indicated as 2048 µops for Zen 1 and 4096 µops for Zen 2, with a line size of 8 µops, but the effective size was measured to only slightly more than half or this number when running a single thread. The effective size is almost doubled when running two threads in the same core.

so if this limitation still applies to ZEN3 or if the CPU is capable to use whole cache when running single thread.

Posted: **2021-02-03, 12:06:50**

It can use the whole µop cache in a single thread.

Posted: **2021-03-02, 14:53:55**

Thank you, very interesting (as always)!

One thing that caught my eye was the latency for DIV/IDIV. It seems that it is now almost 3x faster, did you notice such an improvement? Do you see a trend here that the new CPUs are finally getting much faster division? Are the days of using floating-point division for integers, various methods employing reciprocals and Montgomery multiplication for modular reduction slowly coming to an end?

Posted: **2021-03-03, 7:04:42**

@tuom.

Yes, division got faster. 64 bit integer division on Intel processors is still slow, but 32-bit division is faster.

Compilers are using various multiply and shift methods for integer division where the divisor is known at compile time. This is still relevant. Such methods are also needed on integer SIMD vectors where there is no integer division instruction. My C++ vector class library implements integer vector division by multiplication and shift methods.

Posted: **2021-06-10, 20:54:21**

I am sure that both 5600 and 5700X SKUs will come. Probably only next year though, either when the B450/X470 MBs get their 5000 BIOS updates, or later when Rocket Lake launches around March 2021.

I highly doubt AMD will stick to only the 4 launch SKUs, especially when you consider how extensive the previous gen Ryzen lineups were. Just because they have always released a larger portion of their SKUs at launch in the past, does not mean AMD wont extend the 5000 SKU lineup in the future. Even Zen2/3000 saw numerous SKU additions well after launch.

What we are seeing is early adopter fees, because AMD has learnt their lesson from launching cheaper SKUs (3600/3700X etc) at the same time as the more expensive models (3600X/3800X etc) from previous launches, because the cheaper non-X and lower end X SKUs (3600/3700X etc), always outsell their more expensive brethren (3600X/3800X).

So AMD is just making their profit off the higher priced SKUs (5600X/5800X) while they are able to (and it also allows more left over Zen2 stock to sell as well), and then we will see the cheaper models later on (next year most likely).

So I am fairly certain we will see the cheaper SKUs (5600/5700X) early next year sometime, and later on price drops. Incredibly unlikely AMD will stick to only the 4 launch SKU models, and what we are seeing is just business.

They have pretty much the best all round CPUs at the moment, and they can charge for them, especially at launch. And TSMCs (very mature) 7nm yields are excellent, and so is silicon quality, but 5000 launch demand will guarantee higher priced SKU sales, that is why AMD is unwilling to bin them as lower end SKUs (5600/5700X) right now and sell them for less (regardless of whether there is enough supply capacity to meet demand or not). So AMD is just selling everything they have currently made available,

for the highest price they can get for them. Why bin such great quality silicon lower, and then sell it for less than you can actually get for it? Especially at launch when people are mostly prepared to pay for it?

So they will sell as much as they can now, and other SKUs will follow later.

Just give it time, it really is just bushiness, AMD is still just a company, and like all companies,

it is in it to make a profit. Nothing wrong with that, they have the best, and they are charging the extra for it. Let those who are willing to pay the early adopter fee purchase them now, AMD will cater to more budgets a little later on.

If we would like the older launch structure to return (cheaper SKUs such as the 3600/5600/3700X/5700X at launch), we will need Intel to get seriously back in competition, with a very strong lineup, and decent prices to boot.

Otherwise this launch structure will likely remain the status quo for all successive launches (especially if AMDs products keep going from strength to strength). AMD doesn't need to compete with lower priced 5600 and 5700X SKUs right out of the gate with this launch, because they are pretty much leading in all round performance, and demand is at an all time high. So AMDs more expensive 5600X/5800X SKUs are guaranteed to sell well thanks to high demand with this launch.

So the limited launch SKU lineup is not a surprise, but it would be a surprise if AMD were to not release any cheaper SKUs in the 5000 lineup. It would be the most limited range of just about any CPU series ever released, if AMD were to release nothing more than the 5950X, 5900X, 5800X and 5600X SKUs. That makes no sense whatsoever.

And there isn't any room for more expensive SKUs in the current lineup, but there is room for cheaper "sales boosting" SKUs in the current lineup. Do you see what I am getting at? And AMDs silence on any upcoming models also makes financial sense, it is so they can keep the launch lineup selling like hot cakes.

AMD doesn't want any potential buyers to wait for cheaper SKUs, and that is what would likely occur, if AMD outright declared that cheaper (although slightly lower performing) SKUs were coming next year.

AMD obviously wants to keep the more expensive launch SKUs sales momentum going, right through the frenzied holiday sales season, and right till the very moment they announce additional SKUs.

AMDs goal is to achieve tremendous (and their best to date) launch and holiday financial earnings. So we should only expect announcements for additional (and cheaper) SKUs next year (in the first 3 months latest), it doesn't make financial sense for AMD to announce or release them before then, but they will come.

There have already been rumors going around about an cheaper 5600 coming, but no official word as of yet. And I don't expect any official word till after the holiday and launch sales frenzy (so likely early next year).

Agner's CPU blog

AMD Ryzen 5800

AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800

Re: AMD Ryzen 5800