Here is the snippet of code: float uf[8][8];
Vec8f uL;
uL = gather8f<0,8,16,24,32,40,48,56>(&(uf[0][0])); The goal of this code is to take the elements uf[*][0] and put them in a single Vec8f It segfaulted in normal use (g++ 5.4.0 under cygwin on Windows 10), flags g++ -O3 -mavx2 -mfma
When I ran under gdb it didn't segfault and did what I expected. I have equivalent code that does it with 8 inserts for less than AVX which seems inefficient (Which deals with the lack of gather8f for < AVX). I'd appreciate any suggestions. It is part of a larger program. I could try to make a small program that shows the issue but I am not sure how easy that would be. Regards, James |