Your vectorclass is very useful to me and well documented. I have one question about FMA in the vectorclass. You write in your text
"The FMA3 and FMA4 instruction sets are not handled directly by any code in the
vector class library, but by the compiler. The compiler will automatically combine
a floating point multiplication and a subsequent addition or subtraction into a
single instruction." But according to the following the compiler won't use FMA unless you allow for a relaxed floating point model and even then it might not do it.
stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-see-avx/15933677?noredirect=1#comment22702114_15933677 There are specific FMA instructions, e.g. _mm_fmadd_ps(), which could be used. When I search the vectorclass I don't find any (which is what you say in your text). Can you explain to me why you don't support these instructions directly? |