bitwise bitwise - 5 months ago 86
iOS Question

RGBA to ABGR: Inline arm neon asm for iOS/XCode

This code(very similar code, haven't tried exactly this code) compiles using Android NDK, but not with XCode/armv7+arm64/iOS

Errors in comments:

uint32_t *src;
uint32_t *dst;

#ifdef __ARM_NEON
__asm__ volatile(
"vld1.32 {d0, d1}, [%[src]] \n" // error: Vector register expected
"vrev32.8 q0, q0 \n" // error: Unrecognized instruction mnemonic
"vst1.32 {d0, d1}, [%[dst]] \n" // error: Vector register expected
:
: [src]"r"(src), [dst]"r"(dst)
: "d0", "d1"
);
#endif


What's wrong with this code?




EDIT1:

I rewrote the code using intrinsics:

uint8x16_t x = vreinterpretq_u8_u32(vld1q_u32(src));
uint8x16_t y = vrev32q_u8(x);
vst1q_u32(dst, vreinterpretq_u32_u8(y));


After disassembling, I get the following, which is a variation I have already tried:

vld1.32 {d16, d17}, [r0]!
vrev32.8 q8, q8
vst1.32 {d16, d17}, [r1]!


So my code looks like this now, but gives the exact same errors:

__asm__ volatile("vld1.32 {d0, d1}, [%0]! \n"
"vrev32.8 q0, q0 \n"
"vst1.32 {d0, d1}, [%1]! \n"
:
: "r"(src), "r"(dst)
: "d0", "d1"
);


EDIT2:

Reading through the disassembly, I actually found a second version of the function. It turns out that arm64 uses a slightly different instruction set. For example, the arm64 assembly uses
rev32.16b v0, v0
instead. The whole function listing(which I can't make heads or tails of) is below:

_My_Function:
cmp w2, #0
add w9, w2, #3
csel w8, w9, w2, lt
cmp w9, #7
b.lo 0x3f4
asr w9, w8, #2
ldr x8, [x0]
mov w9, w9
lsl x9, x9, #2
ldr q0, [x8], #16
rev32.16b v0, v0
str q0, [x1], #16
sub x9, x9, #16
cbnz x9, 0x3e0
ret

Answer

As stated in the edits to the original question, it turned out that I needed a different assembly implementation for arm64 and armv7.

#ifdef __ARM_NEON
  #if __LP64__
asm volatile("ldr q0, [%0], #16  \n"
             "rev32.16b v0, v0   \n"
             "str q0, [%1], #16  \n"
             : "=r"(src), "=r"(dst)
             : "r"(src), "r"(dst)
             : "d0", "d1"
             );
  #else
asm volatile("vld1.32 {d0, d1}, [%0]! \n"
             "vrev32.8 q0, q0         \n"
             "vst1.32 {d0, d1}, [%1]! \n"
             : "=r"(src), "=r"(dst)
             : "r"(src), "r"(dst)
             : "d0", "d1"
             );
  #endif
#else

The intrinsics code that I posted in the original post generated surprisingly good assembly though, and also generated the arm64 version for me, so it may be a better idea to use intrinsics instead in the future.