shd shd -4 years ago 146
C++ Question

writing assembly code in c++

I have the following code in C++:

inline void armMultiply(const float* __restrict__ src1,
const float* __restrict__ src2,
float* __restrict__ dst)
{
__asm volatile(
"vld1.f32 {q0}, [%[src1]:128]! \n\t"
:
:[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
);
}


Why do I get the error vector register expected ?

Answer Source

You're getting this error because your inline assembly is for 32 bit arm, but you're compiling for 64 bit arm (with clang - with gcc you would have gotten a different error).

(Inline) assembly is different between 32 and 64 bit arm, so you need to guard it with e.g. #if defined(__ARM_NEON__) && !defined(__aarch64__), or if you want to have different assembly for both 64 and 32 bit: #ifdef __aarch64__ .. #elif defined(__ARM_NEON__), etc.

As others commented, unless you really need to manually handtune the produced assembly, intrinsics can be just as good (and in some cases, better than what you produce yourself). You can e.g. do the two vld1_f32 calls, one vmul_f32 and one vst1_f32 via intrinsics just fine.

EDIT:

The corresponding inline assembly line for loading into a SIMD register on 64 bit would be:

"ld1 {v0.4s}, [%[src1]], #16      \n\t"

To support both, your function could look like this instead:

inline void armMultiply(const float* __restrict__ src1,
                        const float* __restrict__ src2,
                        float* __restrict__ dst)
{
#ifdef __aarch64__
    __asm volatile(
                 "ld1 {v0.4s}, [%[src1]], #16      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#elif defined(__ARM_NEON__)
    __asm volatile(
                 "vld1.f32 {q0}, [%[src1]:128]!      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#else
#error this requires neon
#endif
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download