Vector Class Discussion

 
thread Fallbacks for gatherXX()? - Chris Newbold - 2017-07-10
last replythread Fallbacks for gatherXX()? - Agner - 2017-07-10
last replythread Fallbacks for gatherXX()? - Chris Newbold - 2017-07-11
last replythread Fallbacks for gatherXX()? - Agner - 2017-07-11
last replythread Fallbacks for gatherXX()? - Chris Newbold - 2017-07-11
last reply Fallbacks for gatherXX()? - Agner - 2017-07-11
 
Fallbacks for gatherXX()?
Author:  Date: 2017-07-10 13:58
I've been playing around with some toy examples coded with VCL and exploring the potential for write-once compile-many code to target multiple CPU architectures at runtime. I coded my first examples with AVX2 in mind, and wound up using gather4d() and gather8f() quite often. When I then tried compiling for lesser architectures, I discovered that these functions are undefined. Is this an oversight or intended? If intended, is there a recommended alternate coding pattern that works across multiple architectures? Thanks.
   
Fallbacks for gatherXX()?
Author: Agner Date: 2017-07-10 23:35
You may use the lookup functions. Gathering non-contiguous data is very inefficient anyway on CPUs that don't have these instructions.

Sometimes you can avoid the problem by reorganizing the data or by using permute functions.

   
Fallbacks for gatherXX()?
Author:  Date: 2017-07-11 07:25
Thanks. In my case I'm actually trying to write the routines that will reorganize data into a more friendly layout... Here's another one that looks more like an error; in this case I'm trying to compile blend4d for plain AVX:

In file included from vectorclass/vectorclass.h:51:
vectorclass/vectorf256.h:2485:17: error: no case matching constant switch condition '17' [-Werror]
switch (s2) {
^~
vectorclass/vectorf256.h:2556:16: note: in instantiation of function template specialization 'permute4d<1, 3, 5, 7>' requested here
return permute4d<i0, i1, i2, i3> (a);
^
foo.cpp:421:32: note: in instantiation of function template specialization 'blend4d<1, 3, 5, 7>' requested here
const Vec4d out1 = blend4d<1, 3, 5, 7>(in1, in2);

   
Fallbacks for gatherXX()?
Author: Agner Date: 2017-07-11 10:12
I don't know why you get this message. Which compiler are you using?
You can ignore this error because the case s2 = 17 is covered in line 2444 above so it never gets to line 2485.

BTW, the indexes i2, i3 are out of range, but that is not the cause of the message.

   
Fallbacks for gatherXX()?
Author:  Date: 2017-07-11 11:19
The error manifests with Apple LLVM version 8.0.0 (clang-800.0.42.1) which is Xcode 8.1; same code compiles without incident on GCC 4.9. Agree with your assessment; seems like the warning from clang is too aggressive.

Regarding your comment on the out-of-range indexes, perhaps I'm mis-understanding how to use blend. With two Vec4d inputs, isn't this legal: blend4d<1, 3, 5, 7>(i1, i2) ? This seemed consistent with the documentation and examples for blend4d.

Thanks!

   
Fallbacks for gatherXX()?
Author: Agner Date: 2017-07-11 13:29
Sorry, the indexes would be out of range with permute4d, not with blend4d