Vector Class Discussion

unpack four bytes to four ints - chad - 2013-06-04

unpack four bytes to four ints - chad - 2013-06-05

unpack four bytes to four ints

Author: Date: 2013-06-04 08:00

I'm reading in pixel data where each pixel is an integer in RGBA format. I first unpack the the four bytes to flour ints and then convert to floats. One way I could do this is to use extend_low/high. But I would have to do this four times to get four integers. Instead I think it's more efficiency to use _mm_cvtepu8_epi32 intrinsic which unpacks four bytes directly to four ints. Is there a reason this intrinsics is not used by the vectoclass?

Here is the code I use now which unpacks four pixels into 12 floats.

void int4_to_float12(int *x, float*y, const int offset) { //load 4 pixels, convert them from AoS to SoA, expand them to 12 floats Vec16uc c16= Vec16uc().load(x); //RGBARGBARGBARGBA -> 4xRRRRGGGGBBBB Vec4ui i4 = (Vec4ui)permute16uc< 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15>(c16);

Vec4ui row0 = _mm_cvtepu8_epi32(permute4ui<0,-1,-1,-1>(i4)); //RRRR Vec4ui row1 = _mm_cvtepu8_epi32(permute4ui<1,-1,-1,-1>(i4)); //GGGG Vec4ui row2 = _mm_cvtepu8_epi32(permute4ui<2,-1,-1,-1>(i4)); //BBBB //Vec4ui row3 = _mm_cvtepu8_epi32(permute4i<3,-1,-1,-1>(i4)); //AAAA

to_float(row0).store_a(&y[0*offset]); to_float(row1).store_a(&y[1*offset]); to_float(row2).store_a(&y[2*offset]); //to_float(row3).store_a(&y[3*offset]); }

Reply To This Message

unpack four bytes to four ints

Author: Date: 2013-06-05 04:51

I thought a bit more carefully about this function. Since I'm unpacking four pixels at once then using extend_low/high gets me multiple values at once. The new version of the function only uses the vectorclass and it's even slightly faster than the previous version which used _mm_cvtepu8_epi32.

void int4_to_float12_v2(int *x, float*y, const int offset) { Vec16uc c16= permute16uc< 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15>(Vec16uc().load_a(x));

Vec8us low = extend_low(c16); Vec8us high = extend_high(c16); Vec4ui row0 = extend_low(low); //RRRR Vec4ui row1 = extend_high(low); //GGGG Vec4ui row2 = extend_low(high); //BBBB //Vec4ui row3 = extend_high(high); //AAAA

to_float(row0).store_a(&y[0*offset]); to_float(row1).store_a(&y[1*offset]); to_float(row2).store_a(&y[2*offset]); //to_float(row3).store_a(&y[3*offset]);

}

Reply To This Message