Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for AMD Bulldozer processor
Author:  Date: 2012-03-20 09:16
Hi, thanks - I see now what you mean. 4 tagged instructions goes to decoder 0-3; if decoder 0 gets 2MOP then decoder 3 resources are used/stalled. So, it's likely a decoder (3) shares the bus with decoder 0 for outputting MOPS to the OOOE scheduler.
Still, I cannot understand the huge IPC penalty of BD over SB. The LS is almost the same since nehalem (2R/1R1W), BD has a VERY slow REP MOVS (1/3 of SB, sounds very worrying if you consider that mem/str/array copies are still implemented with rep movx), but it cannot account for the performance loss. The lack of the 3rd ALU is important, yet OOOE could easily mix MOV and other instructions in between - for full(?) throughput I need to schedule asm instructions manually on IA.
So, while CMT is surely slown down alot by the decoder - what do you think of single-core performance pitfall, even with regards to K10 architecture? BD decoder/retirement seems better than K10 (max 4 MOPS), yet it lags behind.
 
thread Test results for AMD Bulldozer processor new - Agner - 2012-03-02
replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-13
reply Test results for AMD Bulldozer processor new - Agner - 2012-03-14
last reply Test results for AMD Bulldozer processor new - Alex - 2012-03-14
replythread Test results for AMD Bulldozer processor new - fellix - 2012-03-15
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-16
last replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-16
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-17
reply Test results for AMD Bulldozer processor new - avk - 2012-03-17
last replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-17
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-17
last replythread Test results for AMD Bulldozer processor - Massimo - 2012-03-20
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-21
last reply Cache WT performance of the AMD Bulldozer CPU new - GordonBGood - 2012-06-05
reply Test results for AMD Bulldozer processor new - zan - 2012-04-03
replythread Multithreads load-store throughput for bulldozer new - A-11 - 2014-06-27
last replythread Multithreads load-store throughput for bulldozer new - Bigos - 2014-06-28
last reply Multithreads load-store throughput for bulldozer new - A-11 - 2014-07-04
last reply Store forwarding stalls of piledriver new - A-11 - 2014-09-07