Hello, John. In our (ixbt.com) low-level tests we have confirmed that L1D have 8-byte banks (that was also confirmed by SB arch. team engineer) with 5:3 bits allinged. Solving 4-byte accesses case is easy: OoO mem access (Intel term: MD) will reorder reads to issue them in different banks — A+0 & A+8, then A+4 & A+12, then same for next 4 reads and 2 banks, etc. (A = line's address). Also, delaying 1st access (having a «conflict» event for PMC), it's possible to issue all other loads without reordering, still having different banks: A+0 & (none), A+4 & A+8, A+12 & A+16… DDR for cache bit-lines is possible but removes possibility of (practically — need for) precharge. Without precharge bit-lines will have to swing 0<=>1 and back up to twice per clock. That requires fast (HT) transistors with high parameter uniformity (a Big Problem for 45 nm and bellow) and, most important, will ruin performance/watt metric for such a cache. And both Intel & AMD are avoiding this at all costs — like converting 6T bit-cells to 8T (for L1's and L2's) just to save power. But I'm still curious: how Intel resolved bank conflicts in Haswell. Naive solution is to make all banks 3-ported (2R+W), that would require 10T-cells. But early die-shots show just slighly larger L1D area cf. IB with same aspect ratio. Hm?… While we're at it, can I ask why AMD's memory controllers are so slow, especially on writes? Never can they achieve even 50% of theoretical peak throughput. Intel can do more. See AIDA64 «cache & memory benchmark» results, like this: www.easycom.com.ua/data/nouts/1302101905/img/38_aida64_memory-cache.jpg . |