ddriver ddriver - 9 months ago 50
C Question

Is it possible to cast floats directly to __m128 if they are 16 byte alligned?

Is it safe/possible/advisable to cast floats directly to

if they are 16 byte aligned?

I noticed using
to "wrap" a raw array adds a significant overhead.

What are potential pitfalls I should be aware of?


There is actually no overhead in using the load and store instructions, I got some numbers mixed and that is why I got better performance. Even thou I was able to do some HORRENDOUS mangling with raw memory addresses in a
instance, when I ran the test it took TWICE AS LONG to complete without the
instruction, probably falling back to some fail safe code path.


What makes you think that _mm_load_ps and _mm_store_ps "add a significant overhead" ? This is the normal way to load/store float data to/from SSE registers assuming source/destination is memory (and any other method eventually boils down to this anyway).