Zubin Zubin - 11 months ago 57
R Question

Weird error in R when importing (64-bit) integer with many digits

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)


When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...

1 "4031320121153001444" 4031320121153001472

2 "4113020071082679601" 4113020071082679808

3 "4073020091116779570" 4073020091116779520

4 "2081720101128577687" 2081720101128577792

5 "4041720081087539887" 4041720081087539712

6 "4011120071074301496" 4011120071074301440

7 "4021520051054304372" 4021520051054304256

8 "4082520061068996911" 4082520061068997120

9 "4082620101129165548" 4082620101129165312

Answer Source

As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

UPDATE: Here's how you can get your file into an int64 object:

# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)