Vector Class Discussion

 
thread Use intrinsics in SSE4.1/AVX2 for int conversion - HolyWu - 2016-07-18
last reply Use intrinsics in SSE4.1/AVX2 for int conversion - Agner - 2016-07-18
 
Use intrinsics in SSE4.1/AVX2 for int conversion
Author:  Date: 2016-07-18 12:13
It's quite often that we need to convert data stored in uint8/uint16 (ex. image or video) to int32 first before we can do the calculation in 32-bit floating-point. In SSE4.1 there are _mm_cvtepu*_epi32. In AVX2 there are _mm256_cvtepu*_epi32. They should perform faster than the usual extend_low()+extend_high() method to get the final 32-bit integer from 8-bit integer, when SSE4.1 or AVX2 is available. I wonder if you can add another type of conversion functions for direct (u)int8-to-int32 or (u)int16-to-int32, utilizing the intrinsics in SSE4.1 and AVX2.

Best regards.

   
Use intrinsics in SSE4.1/AVX2 for int conversion
Author: Agner Date: 2016-07-18 13:57
HolyWu wrote:
I wonder if you can add another type of conversion functions for direct (u)int8-to-int32 or (u)int16-to-int32
I will consider this in the next update. Until then, you can just use the intrinsics.