I got different results than what is listed in the instruction tables for Intel Skylake MOVQ r64, mm/x and MOVQ mm/x, r64.
How do I know I have a Skylake?
- ran lscpu command, and it printed out: "Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz" and then I googled it and found that it has a Skylake uarch
- confirmed with: echo | clang -E - -march=native -### ; part of the output said: ' "-target-cpu" "skylake" ' (with clang version 6.0.1)
What does the instruction table say the latencies are?
- MOVQ r64, mm/x ; latency 2
- MOVQ mm/x, r64 ; latency 1 What do my results say the latency is?
- As you say in your docs, it's hard to separate the latency of moves to alternate register files
- I found the Average latency of the two moves is 2. If the latency one way was 2 and the other way was 1, average latency would be 1.5 How did I test the latency?
- used PMCTestB64.nasm
"
mov ebp, 100
align 16
LL: %REP 50 ; example: 100 shift instructions
movq rax,xmm0
movq xmm0,rax
%ENDREP dec ebp
jnz LL
"
The result was
Clock Core cyc Instruct Uops
20118 20036 10205 10105
20092 19988 10205 10105
20050 19986 10205 10105
20054 19988 10205 10105
20088 19987 10205 10105
20118 19987 10205 10105 The instruction tables say that the latency for each of the corresponding MOVDs in Skylake is 2 and use the same ports. The Intel C/C++ intrinsics page says the latency for both is 2 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=cvtsi (I know that your results often disagree with theirs; just wanted to be thorough). Hopefully I went wrong somewhere b/c it would be beneficial to my research if the latency was lower. Thank you for creating these resources and making them publicly available. I appreciate you. |