jww - 1 year ago 150
C Question

vgetq_lane_u64(x, 0) versus vget_low_u64(x)

ARM instrinsics include functions to extract scalars of different sizes. The functions are documented most completely in the ARMĀ® C Language Extensions:

``````ET vgetQ_lane_ST(T vec, const int lane);
``````

gets the value from the specified lane of an input vector. There are
24 intrinsics.

And:

``````T vget_high_ST(T2 a);
T vget_low_ST(T2 a);
``````

gets the high, or low, half of a 128-bit vector. There are 24
intrinsics.

I know an equivalence exists in some circumstances. For example, on a little-endian machine, the following holds true for 64-bit values:

``````uint64x2_t x = ...;
vgetq_lane_u64(x, 0) == vget_low_u64(x);
``````

A similar equivalence exists for the high lane:

``````uint64x2_t x = ...;
vgetq_lane_u64(x, 1) == vget_high_u64(x);
``````

My question is, what are the practical differences since both functions return a scalar? Should one be preferred over the other?

I'd consider the overlap an implementation detail. "...since both functions return a scalar" isn't even true, for starters: `vgetq_lane_u64()` returns a `uint64_t`, which is a scalar; `vgetq_low_u64()` returns a `uint64x1_t`, which is a unit-length vector. Consider that this guy also exists:
``````uint64_t vget_lane_u64(uint64x1_t v, const int lane)
Semantically, use `vgetq_{high,low}` wherever you have a Q register output from a vector operation, and need to split it to pass the data into further vector operations on D registers. Use `vget{,q}_lane` when you are actually extracting a single value to pass off to scalar code. I'm pretty sure that implicit conversion between unit-length vector types and scalar types isn't actually guaranteed anywhere, so I certainly wouldn't rely on it.