Vector Class Discussion

simple ray casting test slower when vectorized
Author:  Date: 2013-03-03 05:57
ok, I further simplified my example (it seemed that in the non-vectorized code the compiler was able to condense multiple identical lines within the loop )

Now we have the following result,
which looks sensibly as expected, i.e. the vectorclass code using only a single packed "vaddpd" whereas the non-vectorized loop does three consequitive "vaddsd"s :

non-vectorized: double pos[3],ray_dir[3];
------------------------------------------
for(t=t0;t<t1;t+=dt)
{
for(k=0;k<3;k++) pos[k] += dt*ray_dir[k];
}

.L2:
vaddsd %xmm4, %xmm0, %xmm0
vaddsd %xmm8, %xmm3, %xmm3
vaddsd %xmm7, %xmm2, %xmm2
vucomisd %xmm0, %xmm5
vaddsd %xmm6, %xmm1, %xmm1
ja .L2

result: pos=330.082,9961.07,817.343 ----> CPU = 686.4 ms

Vectorized: Vec3d v_pos,v_ray_dir;
-----------------------------------

for(t=t0;t<t1;t+=dt)
{
v_pos += dt*v_ray_dir;
}
.L2:
vaddsd %xmm4, %xmm0, %xmm0
vaddpd %ymm2, %ymm1, %ymm1
vucomisd %xmm0, %xmm3
ja .L2

result: v_pos=330.082,9961.07,817.343 ----> CPU = 468.0 ms


Now this looks really good as it gives a speedup of about a factor of 1.5 nicely corresponding to the vectorized loop containing only 4 instructions vs. the 6 instructions of the scalar variant.

So with this actually proving the concept of a convenient 'zero-overhead' SIMD vector class I am going to convert parts of the 'real world' code of the main ray marcher application which will be interesting because it is probably going to be a tight race between RAM bandwidth and compute bandwdith (SIMD/AVX) as the ray marcher has to plough through dozens of gigabytes of voxel data (tricubically interpolated, i.e. requiring 64 memory fetches per voxel ;-)

BTW: Such applications like the dozens-of-GB-raymarcher are the main reason why I prefer many-core CPU + SIMD/AVX + 100+ GB of memory architecture over GPU-based solutions wich may be even faster compute-wise but still only provides 4 GB RAM ...

 
thread simple ray casting test slower when vectorized new - epsilon - 2013-03-02
last replythread simple ray casting test slower when vectorized new - Agner - 2013-03-03
last replythread simple ray casting test slower when vectorized - epsilon - 2013-03-03
last replythread simple ray casting test slower when vectorized new - Agner - 2013-03-03
last reply simple ray casting test slower when vectorized new - chad - 2013-03-14