Massimo wrote:
Do you think AMD will add a trace cache to fix the
bad dual-core decoder throughput like intel did?
Decoding is often a bottleneck in CISC designs. The trace cache on Intels Netburst (P4) was not very successful. I think it would be better to have one set of decoders per thread in the Bulldozer. AMD has instruction boundaries marked in the code cache which, strangely, Intel don't. So an extra set of decoders would be just a matter of die space and power consumption and it would greatly increase the throughput.It is strange that the floating point throughput is higher than the integer throughput on the Bulldozer. Later versions of Bulldozer can also do register-to-register moves in the two AGU pipelines, according to AMD manuals. I guess they will add more instructions to these pipelines to get a 4 instruction integer throughput in the future. Others have criticized the cache design on the Bulldozer. I am not an expert in cache performance so I will not comment on that. |