AMD 'heavy equipment' CPUs
Posted: 2022-12-05, 20:01:05
Recently I've been reading your 'microarchitecture' guide with great interest. Thanks for making this available!
Regarding the AMD family 15h CPUs, on pg. 220 you write:
"It saves power quite aggressively by slowing down the clock speed most of the time. Some versions also lower the voltage to the CPU when the clock speed is reduced. The maximum clock speed is only obtained after a long sequence of CPU-intensive code."
I believe that the power-saving states (ones with lower frequency/voltage) are entered into in response to a software command (specifically, the P-state command register MSRC001_0062). Generally this would be controlled by ACPI and there should be BIOS settings to alter this behavior. On the other hand, I believe that Core Performance Boost (a.k.a. 'turbo') does result in frequency/voltage changes without software intervention, although there should be a BIOS setting to enable/disable this feature also. I wonder if Core Performance Boost could have affected your experimentation on this CPU. In particular, on page 228 you write:
"The measured throughput is two reads or one read and one write per clock cycle when only one thread is active. We would not expect the throughput to be less when multiple threads are active because each core has separate load/store units and level-1 data cache. But my measurements indicate that level-1 cache throughput is several times lower when multiple threads are running, even if the threads are running in different units that do not share any level-1 or level-2 cache."
I wonder if, while having CPB enabled, running a test with multiple threads caused the CPU to intermittently change between the various boosted P-states and/or the base frequency, leading to strange results.
According to AMD's manual, the L1 data cache is write-through and there is a queue which holds the data until it can be written to L2. So there is an explanation for reduced write performance when two threads are running on the same module at least. Threads on different modules are more difficult to explain...
Regarding the AMD family 15h CPUs, on pg. 220 you write:
"It saves power quite aggressively by slowing down the clock speed most of the time. Some versions also lower the voltage to the CPU when the clock speed is reduced. The maximum clock speed is only obtained after a long sequence of CPU-intensive code."
I believe that the power-saving states (ones with lower frequency/voltage) are entered into in response to a software command (specifically, the P-state command register MSRC001_0062). Generally this would be controlled by ACPI and there should be BIOS settings to alter this behavior. On the other hand, I believe that Core Performance Boost (a.k.a. 'turbo') does result in frequency/voltage changes without software intervention, although there should be a BIOS setting to enable/disable this feature also. I wonder if Core Performance Boost could have affected your experimentation on this CPU. In particular, on page 228 you write:
"The measured throughput is two reads or one read and one write per clock cycle when only one thread is active. We would not expect the throughput to be less when multiple threads are active because each core has separate load/store units and level-1 data cache. But my measurements indicate that level-1 cache throughput is several times lower when multiple threads are running, even if the threads are running in different units that do not share any level-1 or level-2 cache."
I wonder if, while having CPB enabled, running a test with multiple threads caused the CPU to intermittently change between the various boosted P-states and/or the base frequency, leading to strange results.
According to AMD's manual, the L1 data cache is write-through and there is a queue which holds the data until it can be written to L2. So there is an explanation for reduced write performance when two threads are running on the same module at least. Threads on different modules are more difficult to explain...