Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

INC/DEC throughput
Author:  Date: 2017-10-09 23:52
Everything except your table says 1c throughput for INC/DEC, not 0.5c, for both KNL and Silvermont, and Intel's manual explains why.

InstLatx64 for Silvermont says 1 per clock INC r32 throughput, but 0.5c ADD r32, imm8. Same for Goldmont (except that ADD r32,imm8 is 0.33c).

I haven't found InstLatx64 results for actual KNL, but Intel's optimization manual describes KNL as having the same flag-merging extra uop for INC/DEC as Silvermont, so I expect that INC is worth avoiding on KNL as well when you're bottlenecked on issue throughput instead of decode. (Is there any IDQ between decode and issue to let the expansion from 1 uop decoded / 2 uops issued fill decode bubbles?) If so, it could still have an apparent cost of 0.5c in the right circumstances.

According to Intel's optimization manual, 17.1.2 Out-of-Order Engine (KNL):

Additionally, some instructions in the Knights Landing microarchitecture will be decoded as one uop by
the front end but need to expand to two operations for execution. These complex uops will have an allocation
throughput of one per cycle. Examples of these instructions are:


  • POP: integer load data + ESP update, PUSH: integer store data + ESP update.
  • INC/DEC: add to register + update partial flags
  • Gather: two VPU uops
  • CALL / RET: JMP + ESP update
  • LEA with 3 sources

(lightly edited the list to group related instructions better.)

This means there's *always* a flag-merging uop, even when nothing reads INC's flags. And it explains why the sustained throughput of INC/DEC is only 1, not 0.5, according to measurements other than yours, and according to Intel's published tables. It's nice that the integer register update itself doesn't have a false dep on flags, though, so they made it a lot less bad than P4.

 
thread Test results for Knights Landing new - Agner - 2016-11-26
reply Test results for Knights Landing new - Nathan Kurz - 2016-11-26
replythread Test results for Knights Landing new - Tom Forsyth - 2016-11-27
reply Test results for Knights Landing new - Søren Egmose - 2016-11-27
last reply Test results for Knights Landing new - Agner - 2016-11-30
replythread Test results for Knights Landing new - Joe Duarte - 2016-12-03
replythread Test results for Knights Landing new - Agner - 2016-12-04
last reply Test results for Knights Landing new - Constantinos Evangelinos - 2016-12-05
last replythread Test results for Knights Landing new - John McCalpin - 2016-12-06
replythread Test results for Knights Landing new - Agner - 2016-12-06
last reply Test results for Knights Landing new - John McCalpin - 2016-12-08
last reply Test results for Knights Landing new - Joe Duarte - 2016-12-07
replythread Test results for Knights Landing new - zboson - 2016-12-28
last reply VZEROUPPER new - Agner - 2016-12-28
replythread Test results for Knights Landing new - Ioan Hadade - 2017-07-13
last reply Test results for Knights Landing new - Agner - 2017-07-13
last replythread INC/DEC throughput - Peter Cordes - 2017-10-09
last reply INC/DEC throughput new - Agner - 2017-10-10