jww jww - 6 months ago 47
C Question

vgetq_lane_u64(x, 0) versus vget_low_u64(x)

ARM instrinsics include functions to extract scalars of different sizes. The functions are documented most completely in the ARMĀ® C Language Extensions:

ET vgetQ_lane_ST(T vec, const int lane);

gets the value from the specified lane of an input vector. There are
24 intrinsics.


T vget_high_ST(T2 a);
T vget_low_ST(T2 a);

gets the high, or low, half of a 128-bit vector. There are 24

I know an equivalence exists in some circumstances. For example, on a little-endian machine, the following holds true for 64-bit values:

uint64x2_t x = ...;
vgetq_lane_u64(x, 0) == vget_low_u64(x);

A similar equivalence exists for the high lane:

uint64x2_t x = ...;
vgetq_lane_u64(x, 1) == vget_high_u64(x);

My question is, what are the practical differences since both functions return a scalar? Should one be preferred over the other?


I'd consider the overlap an implementation detail. "...since both functions return a scalar" isn't even true, for starters: vgetq_lane_u64() returns a uint64_t, which is a scalar; vgetq_low_u64() returns a uint64x1_t, which is a unit-length vector. Consider that this guy also exists:

uint64_t vget_lane_u64(uint64x1_t v, const int lane)

Semantically, use vgetq_{high,low} wherever you have a Q register output from a vector operation, and need to split it to pass the data into further vector operations on D registers. Use vget{,q}_lane when you are actually extracting a single value to pass off to scalar code. I'm pretty sure that implicit conversion between unit-length vector types and scalar types isn't actually guaranteed anywhere, so I certainly wouldn't rely on it.