Hi Agner,
With your permission, maybe going a different which might be easier for me.I have already in my code 2 versions of each function.
One targets CPU's with SSE4 and one targets AVX.
What I would like to be able to do is the following: 1. When I call a function from VCL on __m128, for instance exp(), I would like the generated code to use SSE4 only.
2. When I call a function from VCL on __m256, for instance exp(), I would like the generated code to use AVX only. Is there a way to do so with VCL? Thank You. |