Vector Class Discussion

 
thread Generating constants without load? - myocytebd - 2018-05-15
last replythread Generating constants without load? - Agner - 2018-05-15
last replythread Generating constants without load? - myocytebd - 2018-05-15
last reply Generating constants without load? - Agner - 2018-05-15
 
Generating constants without load?
Author:  Date: 2018-05-15 01:28
How to properly generate constants without load with vectorclass or intriniscs?

Things like this doesn't work with gcc: it is "optimized" into a 128-bit load at -Os or above.
__m128i ud = _mm_undefined_si128();
__m128i one = _mm_cmpeq_epi32(ud, ud);

   
Generating constants without load?
Author: Agner Date: 2018-05-15 02:22
-Os means optimize for size. Loading a constant from memory takes more size in both code cache and data cache and it may cause cache misses. Does it also happen when optimizing for speed?

I think I have seen the Gnu compiler doing the opposite: replace a load of all 1's with _mm_cmpeq_epi32. I don't know how they are weighing the two alternatives, but the load you are reporting seems to be suboptimal.

This has been reported as a bug in 2009: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084
I don't know why it hasn't been fixed. I think that bugzilla is the right place to ask.

   
Generating constants without load?
Author:  Date: 2018-05-15 04:58
Thanks for the hints.

It doesn't seem to be affected by mtune or -OX, but is highly affected by contextual code.

In a micro with no ALU, gcc translates it into load.
e.g. movemask(pcmpeq(x0, x0)) -> movaps; movemask

When there is a few ALU, gcc keeps it as intended.

In real codes, when there is >1 constants generated nearby, some of them might be translated into load.
(I grep "movaps.*rip" for the potential problematic ones, and check them)

   
Generating constants without load?
Author: Agner Date: 2018-05-15 05:29
The Gcc people say that the bug has been resolved. If you can make a reproducible minimum example that is not optimized then please post it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084