Daeyoung Lim - 2 months ago 11
Python Question

# Python and R are returning different results where they should be exactly the same

[Python numpy code]

``````In [171]: A1*b
Out[171]:
array([ -7.55603523e-01,   7.18519356e-01,   3.98628050e-03,
9.27047917e-04,  -1.31074698e-03,   1.44455190e-03,
1.02676602e-03,   5.03891225e-02,  -1.15752426e-03,
-2.43685270e-02,   5.88382307e-03,   2.63372861e-04])
In [172]: (A1*b).sum()
Out[172]: -1.6702134467139196e-16
``````

[R code]

``````> cholcholT[2,] * b
[1] -0.7556035225  0.7185193560  0.0039862805  0.0009270479 -0.0013107470
[6]  0.0014445519  0.0010267660  0.0503891225 -0.0011575243 -0.0243685270
[11]  0.0058838231  0.0002633729
> sum(cholcholT[2,] * b)
[1] -9.616873e-17
``````

The first is the R code and second is numpy. Up until the element-wise product of two vectors, they return the same result. However, if I try to add them up, they become different. I believe it doesn't have to do with the precision settings of the two since they both are double-precision based. Why is this happening?

You are experiencing what is called catastrophic cancellation. You are subtracting numbers from each other which differ only very slightly. As a result you get numbers which have a very high error relative to their value. The error stems from rounding errors which are introduced when your system stores values which cannot be represented by the binary system accurately.

Intuitively, you can think of this as the same difficulties you have when writing 1/3 as a decimal number. You would have to write 0.3333... , so infinitely many 3s behind the decimal point. You cannot do this and your computer can't either.

So your computer has to round the numbers somewhere.

You can see the rounding errors if you use something like

``````"{:.20e}".format(0.1)
``````

You will see that after the 16th digit or so the number you wanted to store (1.0000000000000000000...×10^-1) is different from the number the computer stores (1.00000000000000005551...×10^-1)

To see in which order of magnitude this inaccuracy lies, you can view the machine epsilon. In simplified terms, this value gives you the minimum amount relative to your value which you can add to your value so that the computer can still distinguish the result from the old value (so it gets not rounded away while storing the result in memory).

If you execute

``````import numpy as np
eps = np.finfo(float).eps
``````

you can see that this value lies on the order of magnitude of 10^-16.

The computer reprents floats in a form like `SIGN|EXPONENT|FRACTION`. So to simplify greatly, If computer memory would store numbers in decimal format, a number like -0.0053 would be stored as `1|-2|.53|`. `1` is for the negative sign, `-2` means 'FRACTION times 10^-2'.

If you sum up floats, the computer must represent each float with the same exponent to add/subtract the digits of the FRACTION from each other. Therefore all your values will be represented in terms of the greatest exponent of your data, which is -1. Therefore your rounding error will be in the order of magnitude of 10^-16*10^-1 which is 10^-17. You can see that your result is in this order of magnitude as well, so it is very much influenced by the rounding errors of your digits.