Johannes Johannes - 1 month ago 5x
C Question

Need an efficient subtraction algorithm modulo a number

For given numbers

, I would like to calculate
x-y mod n
in C. Look at this example:

int substract_modulu(int x, int y, int n)
return (x-y) % n;

As long as
, we are fine. In the other case, however, the modulu operation is undefined.

You can think of
. I would like the result to be positive, so if
, then
((x-y)-substract_modulu(x,y,n))/ n
shall be an integer.

What is the fastest algorithm you know for that? Is there one which avoids any calls of


As many have pointed out, in current C and C++ standards, x % n is no longer implementation-defined for any values of x and n. It is undefined behaviour in the cases where x / n is undefined [1]. Also, x - y is undefined behaviour in the case of integer overflow, which is possible if the signs of x and y might differ.

So the main problem for a general solution is avoiding integer overflow, either in the division or the subtraction. If we know that x and y are non-negative and n is positive, then overflow and division by zero are not possible, and we can confidently say that (x - y) % n is defined. Unfortunately, x - y might be negative, in which case so will be the result of the % operator.

It's easy to correct for the result being negative if we know that n is positive; all we have to do is unconditionally add n and do another modulo operation. That's unlikely to be the best solution, unless you have a computer where division is faster than branching.

If a conditional load instruction is available (pretty common these days), then the compiler will probably do well with the following code, which is portable and well-defined, subject to the constraints that x,y ≥ 0 ∧ n > 0:

((x - y) % n) + ((x > y) ? 0 : n)

For example, gcc produces this code for my core I5 (although it's generic enough to work on any non-Paleozoic intel chip):

    idivq   %rcx
    cmpq    %rsi, %rdi
    movl    $0, %eax
    cmovg   %rax, %rcx
    leaq    (%rdx,%rcx), %rax

which is cheerfully branch-free. (Conditional move is usually a lot faster than branching.)

Another way of doing this would be (except that the function sign needs to be written):

((x - y) % n) + (sign(x > y) & (unsigned long)n)

where sign is all 1s if its argument is negative, and otherwise 0. One possible implementation of sign (adapted from bithacks) is

unsigned long sign(unsigned long x) {
  return x >> (sizeof(long) * CHAR_BIT - 1);

This is portable (casting negative integer values to unsigned is defined), but it may be slow on architectures which lack high-speed shift. It's unlikely to be faster than the previous solution, but YMMV. TIAS.

Neither of these produce correct results for the general case where integer overflow is possible. It's very difficult to deal with integer overflow. (One particularly annoying case is n == -1, although you can test for that and return 0 without any use of %.) Also, you need to decide your preference for the result of modulo of negative n. I personally prefer the definition where x%n is either 0 or has the same sign as n -- otherwise why would you bother with a negative divisor -- but applications differ.

The three-modulo solution proposed by Tom Tanner will work if n is not -1 and n + n does not overflow. n == -1 will fail if either x or y is INT_MIN, and the simple fix of using abs(n) instead of n will fail if n is INT_MIN. The cases where n has a large absolute value could be replaced with comparisons, but there are a lot of corner cases, and made more complicated by the fact that the standard does not require 2's complement arithmetic, so it's not easily predictable what the corner cases are [2].

As a final note, some tempting solutions do not work. You cannot just take the absolute value of (x - y):

(-z) % n == -(z % n) == n - (z % n) ≠ z % n (unless z % n happens to be n / 2)

And, for the same reason, you cannot just take the absolute value of the result of modulo.

Also, you cannot just cast (x - y) to unsigned:

(unsigned)z == z + 2k (for some k) if z < 0
(z + 2k) % n == (z % n) + (2k % n) ≠ z % n unless (2k % n) == 0

[1] x/n and x%n are both undefined if n==0. But x%n is also undefined if x/n is "not representable" (i.e. there was integer overflow), which will happen on twos-complement machines (that is, all the ones you care about) if x is most negative representable number and n == -1. It's clear why x/n should be undefined in this case, but slightly less so in the case of x%n, since that value is (mathematically) 0.

[2] Most people who complain about the difficulty of predicting the results of floating-point arithmetic haven't spent much time trying to write truly portable integer arithmetic code :)