Etan Etan - 9 days ago 4
Swift Question

Assembler on 64-bit iOS (A64)

I'm trying to replace certain methods with asm-implementations. Target is arm64 on iOS (iPhone 5S or newer). I want to use a dedicated assembler-file, as the inline assembler comes with additional overhead, and is quite cumbersome to use with A64 memory offsets.

There is not too much documentation on this on the Internet, so I'm kind of unsure if how I do it is the way to go. Therefore, I'll describe the process I followed to move a function to ASM.




The candidate function for this question is a 256-bit integer comparison function.

UInt256.h


@import Foundation;

typedef struct {
uint64_t value[4];
} UInt256;

bool eq256(const UInt256 *lhs, const UInt256 *rhs);


Bridging-Header.h


#import "UInt256.h"


Reference implementation (Swift)


let result = x.value.0 == y.value.0
&& x.value.1 == y.value.1
&& x.value.2 == y.value.2
&& x.value.3 == y.value.3


UInt256.s


.globl _eq256
.align 2
_eq256:
ldp x9, x10, [x0]
ldp x11, x12, [x1]
cmp x9, x11
ccmp x10, x12, 0, eq
ldp x9, x10, [x0, 16]
ldp x11, x12, [x1, 16]
ccmp x9, x11, 0, eq
ccmp x10, x12, 0, eq
cset x0, eq
ret





Resources I found






Questions

I've tested the code using XCTest, creating two random numbers, running both the Swift and the Asm implementations on them and verifying that both report the same result. The code seems to be correct.


  1. In the asm file: The
    .align
    seems to be for optimization - is this really necessary, and if yes, what is the correct value to align to?

  2. Is there any source that clearly explains how the calling convention for my specific function signature is?

    a. How can I know that the inputs are actually passed via
    x0
    and
    x1
    ?

    b. How can I know that it is correct to pass the output in
    x0
    ?

    c. How can I know that it is safe to clobber
    x9
    -
    x12
    and the status registers?

    d. Is the function called the same way when I call it from C instead of Swift?

  3. What does "Indirect result location register" mean for the
    r8
    register description in the ARM document?

  4. Do I need any other assembler directives besides
    .globl
    ?

  5. When I set breakpoints, the debugger seems to get confused where it actually is, showing incorrect lines etc. Am I doing something wrong?


Answer
  1. The .align 2 directive is required for program correctness. A64 instructions need to be aligned on 32-bit boundaries.
  2. The documentation you linked seems clear to me and unfortunately this isn't the place to ask for recommendations.
    • You can determine that registers lhs and rhs get stored in X0 and X1 by by following the instructions given in section 5.4.2 (Parameter Passing Rules) of the Procedure Call Standard for the ARM 64-bit Architecture (AArch64) document you linked. Since the parameters are both pointers the only specific rule that applies is C.7.
    • You can determine which register is used to return values in by following the instructions given section 5.5 (Result Return). This just has you following the same rules as for parameters. Since the function returns an integer only rule C.7 applies and so the value is returned in X0.
    • It's safe to change the values stored in registers X9 through X12 because they're listed as temporary registers in the table given in section 5.1.1 (General-purpose Registers)
    • The question is really whether the function is called the same way in Swift as in C. Both the Procedure Call Standard document and the Apple specific exceptions document you linked are defined in terms of C and C++. Presumably Swift follows the same conventions but I don't know if Apple has made that explicit anywhere.
  3. The purpose of R8 is described in section 5.5 (Result Return). It's used when the return value is too big to fit into the registers used to return values. In that case the caller creates a buffer for the return value and puts it address in R8. The function then copies the return value in to this register.
  4. I don't believe you need anything else in your example assembly program.
  5. You've asked too many questions. You should post a separate and more detailed question describing your problem.

I should say one advantage of writing your code using inline assembly is that you wouldn't have to worry about any of this. Something like the following untested C code shouldn't be too unwieldy:

bool eq256(const UInt256 *lhs, const UInt256 *rhs) {
     const __int128 *lv = (__int128 const *) lhs->value;
     const __int128 *rv = (__int128 const *) rhs->value;

     uint64_t l1, l2, r1, r2, ret;

     asm("ldp       %1, %2, %5\n\t"
         "ldp       %3, %4, %6\n\t"
         "cmp       %1, %3\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "ldp       %1, %2, %7\n\t"
         "ldp       %3, %4, %8\r\n"
         "ccmp      %1, %3, 0, eq\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "cset      %0, eq\n\t",
         : "=r" (ret), "=r" (l1), "=r" (l2), "=r" (r1), "=r" (r2)
         : "Ump" (lv[0]), "Ump" (rv[0]), "Ump" (lv[1]), "Ump" (rv[1])
         : "cc")

     return ret;
}

Ok, maybe it's a little unwieldy.