Agner's CPU blog

Posted: **2021-09-19, 22:30:25**

I'm working on patching an old piece of code used to reduce the angles of a 3D float vector to the range ±Pi. The original code used loops to implement a horribly inaccurate version of IEEE remainder.

Replacing the loops with FPREM1 2Pi has been working well so far, but I'd really like to use SSE instructions instead since FPREM1 is slow and the angles can be easily loaded into an XMM register to process as packed singles. The optimization guide recommends to "Multiply by the reciprocal divisor, get the fractional part by subtracting the truncated value, and then multiply by the divisor," but this frequently isn't producing correct results when the input angles are a multiple of Pi.

Is there a simple way to make the SSE version more accurately behave like true IEEE remainder?

Posted: **2021-09-20, 5:14:17**

It is complicated to calculate the remainder with reasonable accuracy for high x. The vector class library is doing this in the sin, cos, and tan functions. See the file vectormath_trig.h in https://github.com/vectorclass/version2
If you are using the reduced x for a trigonometric function anyway then I would recommend using the vector class library.

Agner's CPU blog

SSE replacement for FPREM1

SSE replacement for FPREM1

Re: SSE replacement for FPREM1