jww - 1 year ago 90

C Question

ARM instrinsics include functions to extract scalars of different sizes. The functions are documented most completely in the ARMĀ® C Language Extensions:

`ET vgetQ_lane_ST(T vec, const int lane);`

gets the value from the specified lane of an input vector. There are

24 intrinsics.

And:

`T vget_high_ST(T2 a);`

T vget_low_ST(T2 a);

gets the high, or low, half of a 128-bit vector. There are 24

intrinsics.

I know an equivalence exists in some circumstances. For example, on a little-endian machine, the following holds true for 64-bit values:

`uint64x2_t x = ...;`

vgetq_lane_u64(x, 0) == vget_low_u64(x);

A similar equivalence exists for the high lane:

`uint64x2_t x = ...;`

vgetq_lane_u64(x, 1) == vget_high_u64(x);

My question is, what are the practical differences since both functions return a scalar? Should one be preferred over the other?

Answer Source

I'd consider the overlap an implementation detail. "...since both functions return a scalar" isn't even true, for starters: `vgetq_lane_u64()`

returns a `uint64_t`

, which is a scalar; `vgetq_low_u64()`

returns a `uint64x1_t`

, which is a unit-length vector. Consider that this guy also exists:

```
uint64_t vget_lane_u64(uint64x1_t v, const int lane)
```

Semantically, use `vgetq_{high,low}`

wherever you have a Q register output from a vector operation, and need to split it to pass the data into further vector operations on D registers. Use `vget{,q}_lane`

when you are actually extracting a single value to pass off to scalar code. I'm pretty sure that implicit conversion between unit-length vector types and scalar types isn't actually guaranteed anywhere, so I certainly wouldn't rely on it.