terdon - 6 months ago 21

Perl Question

Consider this (all commands run on an 64bit Arch Linux system):

- Perl (v5.24.0)

`$ perl -le 'print 10190150730169267102/1000%10'`

6

- (GNU Awk 4.1.3)
`awk`

`$ awk 'BEGIN{print 10190150730169267102/1000%10}'`

6

- R (3.3.1)

`> (10190150730169267102/1000)%%10`

[1] 6

`bc`

`$ echo 10190150730169267102/1000%10 | bc`

7

- Python 2 (2.7.12)

`>>> print(10190150730169267102/1000%10)`

7

- Python 3 (3.5.2)

`>>> print(10190150730169267102/1000%10)`

8.0

So, Perl,

`gawk`

`R`

`bc`

Could someone explain what is going on behind the scenes here? What are the limitations in each language and why do they behave quite so differently?

Answer

You're seeing different results for two reasons:

The division step is doing two different things: in some of the languages you tried, it represents

*integer*division, which discards the fractional part of the result and just keeps the integer part. In others it represents actual mathematical division (which following Python's terminology I'll call "true division" below), returning a floating-point result close to the true quotient.In some languages (those with support for arbitrary precision), the large numerator value

`10190150730169267102`

is being represented exactly; in others, it's replaced by the nearest representable floating-point value.

The different combinations of the possibilities in 1. and 2. above give you the different results.

In detail: in Perl, awk, and R, we're working with floating-point values and true division. The value `10190150730169267102`

is too large to store in a machine integer, so it's stored in the usual IEEE 754 binary64 floating-point format. That format can't represent that particular value exactly, so what gets stored is the closest value that *is* representable in that format, which is `10190150730169266176.0`

. Now we divide that approximation by `1000`

, again giving a floating-point result. The exact quotient, `10190150730169266.176`

, is again not exactly representable in the binary64 format, and we get the closest representable float, which happens to be `10190150730169266.0`

. Taking a remainder modulo `10`

gives `6`

.

In bc and Python 2, we're working with arbitrary-precision integers and integer division. Both those languages can represent the numerator exactly. The division result is then `10190150730169267`

(we're doing *integer division*, not *true division*, so the fractional part is discarded), and the remainder modulo `10`

is `7`

. (This is oversimplifying a bit: the format that bc is using internally is somewhat closer to Python's `Decimal`

type than to an arbitrary-precision integer type, but in this case the effect is the same.)

In Python 3, we're working with arbitrary-precision integers and true division. The numerator is represented exactly, but the result of the division is the nearest floating-point value to the true quotient. In this case the exact quotient is `10190150730169267.102`

, and the closest representable floating-point value is `10190150730169268.0`

. Taking the remainder of that value modulo `10`

gives `8`

.

Summary:

- Perl, awk, R: floating-point approximations, true division
- bc, Python 2: arbitrary-precision integers, integer division
- Python 3: arbitrary-precision integers, true division