Fastest way to calculate a 128-bit integer modulo a 64-bit integer

I have a 128-bit unsigned integer A and a 64-bit unsigned integer B. What's the fastest way to calculate

A % B
- that is the (64-bit) remainder from dividing A by B?

I'm looking to do this in either C or assembly language, but I need to target the 32-bit x86 platform. This unfortunately means that I cannot take advantage of compiler support for 128-bit integers, nor of the x64 architecture's ability to perform the required operation in a single instruction.


Thank you for the answers so far. However, it appears to me that the suggested algorithms would be quite slow - wouldn't the fastest way to perform a 128-bit by 64-bit division be to leverage the processor's native support for 64-bit by 32-bit division? Does anyone know if there is a way to perform the larger division in terms of a few smaller divisions?

Re: How often does B change?

Primarily I'm interested in a general solution - what calculation would you perform if A and B are likely to be different every time?

However, a second possible situation is that B does not vary as often as A - there may be as many as 200 As to divide by each B. How would your answer differ in this case?

You can use the division version of Russian Peasant Multiplication.

To find the remainder, execute (in pseudo-code):

X = B;

while (X < A/2)
    X <<= 1;

while (A >= B)
    if (A >= X)
        A -= X;
    X >>= 1;

The modulus is left in A.

You'll need to implement the shifts, comparisons and subtractions to operate on values made up of a pair of 64 bit numbers, but that's fairly trivial.

This will loop at most 254 times (with a 128 bit A). Of course you need to do a pre-check for a zero divisor.

