Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

List Messageboards

Test results for Intel's Sandy Bridge processor

Author: Agner

Date: 2013-08-09 01:53

This looks like an alignment issue. The code is fetched in 16-bytes blocks. Instructions that cross a 16-bytes boundary (or 32-bytes boundary?) are decoded less efficiently. The µop cache is coupled to the instruction cache with a maximum of three 6-µop entries per 32 bytes block of code. How this translates to inefficiency when instructions with certain lengths execute out of the µop cache, I don't really understand.

I have done some experiments to test your claim that fuseable instructions decode less efficiently:

xchg r8,r9    ; 3 µops. Decodes alone
or eax,eax    ; 1 µop, D0
or ebx,ebx    ; 1 µop, D1
or ecx,ecx    ; 1 µop, D2
or edx,edx    ; 1 µop, D3

This decodes in 2 clocks. If the last OR is changed to an AND, it decodes in 3 clocks. It will not put a fuseable arithmetic/logic instruction in decoder D3 because then it can't check in the same clock cycle if the next instruction is a branch. There is no effect when this executes out of the µop cache.

Reply To This Message

Previous Message

Test results for Intel's Sandy Bridge processor new - Agner - 2011-01-30

Test results for Intel's Sandy Bridge processor new - PaulR - 2011-02-15

AVX2 new - phis - 2011-06-23

AVX2 new - Agner - 2011-06-23

Test results for Intel's Sandy Bridge processor new - anon - 2013-08-01

Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-06

Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07

Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-07

Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07

Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-08

Test results for Intel's Sandy Bridge processor new - anon - 2013-08-08

Test results for Intel's Sandy Bridge processor - Agner - 2013-08-09

Test results for Intel's Sandy Bridge processor new - anon - 2013-08-09

Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-10

Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-10

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-09

Test results for Intel's Sandy Bridge processor new - Agner - 2013-10-10

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-11

SB's L1D banks new - Tacit Murky - 2013-11-03

SB's L1D banks new - John D. McCalpin - 2013-11-07

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-18

Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-18

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-24

Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-25

Haswell upper128 power gating new - Peter Cordes - 2015-08-28

Haswell upper128 power gating new - Agner - 2016-01-16

Haswell upper128 power gating new - John D. McCalpin - 2016-01-29

Haswell upper128 power gating new - Agner - 2016-01-30

Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-20

Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-12-21

Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-22

Test results for Intel's Sandy Bridge processor new - Robert - 2015-12-24

Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-12-25

Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-26

Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-08-23

Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25

List Messageboards