Ok so I did some further digging in and found that Clang complains here. In vectorf512e.h:2995
// no zeroing, need to blend
const int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8) |
((i4 << 1) & 0x10) | ((i5 << 2) & 0x20) | ((i6 << 3) & 0x40) | ((i7 << 4) & 0x80);
return _mm256_blend_ps(ta, tb, maskb); // blend Working with another colleague we figured that it is complaining because of left shift on signed int values which is what all template params are. When we remove, just to see if it compiles this time, and make it as follows:
const int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8);
return _mm256_blend_ps(ta, tb, maskb); // blend
It compiles fine. Further on we made the following change:
constexpr int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8) |
((i4 & 8) << 1) | ((i5 & 8) << 2) | ((i6 & 8) << 3) | ((i7 & 8) << 4);
return _mm256_blend_ps(ta, tb, maskb); // blend
which naively looking at the code keeps the same semantic. I did not go through all the complex logic within the code to ensure what I was doing was right, but glancing through it suggested it seems ok. All we did was to move bit-wise end inside to make it seem unsigned int on which we do shift afterwards, rather than doing shift first and then doing bitwise end. This code compiles with Clang. Do you think this would be a right change to make? Thanks. |