Vector Class Discussion

 
single message square_sub_x - zboson - 2015-05-15
 
square_sub_x
Author:  Date: 2015-05-15 07:29
I have implemented a vectorized double-double (https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic) type using your Vector Class Library. Your mul_sub_x function is very useful. For example to multiply two doubles to get a double-double the function is simply

template<typename TYPEV>
static inline DOUBLEFLOATN two_prod(TYPEV const &a, TYPEV const &b) {
TYPEV p = a*b;
TYPEV e = mul_sub_x(a, b, p);
return DOUBLEFLOATN(p, e);
}

where TYVEV is e.g. Vec2d in which case DOUBLEFLOATN is something like struct {Vec2d hi; Vec2d lo;}

However, in my application I need x*x more often than I need x*y. You already provide a square() function but in order to implement a square() function for double-double I need (for efficiency) a square_sub_x function so I though you might want to consider this. Below you can find an example of a square_sub_x function I created for Vec2d based on your mul_sub_x function. I understand of course that you cannot implement everything for every special case but I just thought it was something you might want to consider.

Thank you for your very useful Vector Class Library.

static inline Vec2d square_sub_x(Vec2d const & a, Vec2d const & c) {
#ifdef __FMA__
return _mm_fmsub_pd(a, a, c);
#elif defined (__FMA4__)
return _mm_msub_pd(a, a, c);
#else
// calculate a * b - c with extra precision
Vec2q upper_mask = -(1LL << 27); // mask to remove lower 27 bits
Vec2d a_high = a & Vec2d(_mm_castsi128_pd(upper_mask));// split into high and low parts
Vec2d a_low = a - a_high;
Vec2d r1 = a_high * a_high; // this product is exact
Vec2d r2 = r1 - c; // subtract c from high product
Vec2d r3 = r2 + 2*(a_high * a_low) + a_low * a_low; // add rest of product
return r3; // + ((r2 - r1) + c);
#endif
}