Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for Intel's Sandy Bridge processor
Author:  Date: 2015-08-18 09:45
Hi Agner,
When I was doing some very fine-grained performance testing on Haswell (Xeon E5-2667 v3), I saw some anomalies that reminded me of your comments on the AVX "warm-up" period on Sandy Bridge. The test code is an L1-contained summation of a single vector. For N=2048 and 256-bit VADDPD instructions, it should take 512 cycles (plus some overhead).
What I observed was
(1) an initial "emulation" period of 4-7 iterations that took ~2200 cycles each,
(2) a "transition" iteration that took over 31,000 cycles -- about 25,500 halted, and about 5500 active,
(3) "normal" behavior of 512 or 516 cycles for the rest of the iterations (after subtracting the approximate overhead).

I added an outer loop with a (non-256-bit) "spinner" to see how long it takes for the processor to revert to initial behavior. If the spinner between outer loop iterations was less than 1 millisecond, the subsequent inner iterations ran at full speed. If the spinner between outer loop iterations was more than 1 millisecond, the subsequent inner iterations showed the behavior above.

This behavior occurs even if the core frequency is bound any of the available frequencies (except perhaps the lowest frequency -- I need to go back and double-check those results). Performance counters showed that the core was running at the requested frequency in each case (comparing actual and reference cycles gave the expected ratio).
There are no kernel cycles, even during the transition.
The performance counters for micro-ops dispatched to the various ports show only very minor differences between the "warm-up" and "normal" cycles.
I could not find *any* performance counters (other than cycles) that could distinguish between the 1/4-speed "warm-up" and "normal" operations (but I have not tried all of them).

So this looks like a very low-level emulation of the 256-bit pipeline by forcing everything through the bottom 128-bit pipe, with a remarkably slow transition when the upper 128-bit pipe is enabled. Perhaps the current draw is so large that the chip has to wait for the voltages to settle, even with no frequency change?

I did not look for evidence of the overhead of the transition in the other direction -- I assume it will be much quicker to turn off the upper 128-bit FP pipe than to turn it on.

 
thread Test results for Intel's Sandy Bridge processor new - Agner - 2011-01-30
reply Test results for Intel's Sandy Bridge processor new - PaulR - 2011-02-15
replythread AVX2 new - phis - 2011-06-23
last reply AVX2 new - Agner - 2011-06-23
replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-01
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-06
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-08
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-08
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-09
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-09
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-10
last reply Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-10
replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-09
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-10-10
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-11
last replythread SB's L1D banks new - Tacit Murky - 2013-11-03
last reply SB's L1D banks new - John D. McCalpin - 2013-11-07
replythread Test results for Intel's Sandy Bridge processor - John D. McCalpin - 2015-08-18
replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-18
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-24
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25
last reply Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-25
replythread Haswell upper128 power gating new - Peter Cordes - 2015-08-28
last replythread Haswell upper128 power gating new - Agner - 2016-01-16
last replythread Haswell upper128 power gating new - John D. McCalpin - 2016-01-29
last reply Haswell upper128 power gating new - Agner - 2016-01-30
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-20
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-12-21
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-22
reply Test results for Intel's Sandy Bridge processor new - Robert - 2015-12-24
last replythread Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-12-25
last reply Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-26
last replythread Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-08-23
last reply Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25