Master-chip - 1 year ago 53

Python Question

I have a 30MB .txt file, with ** one** line of data

Unfortunately, every method I've tried (

`mmap.read()`

`readline()`

Every method I found on the internet seems to work on the fact that each line is small, therefore the memory consumption is only as big as the biggest line in the file. Here's the code I've been using.

`start = time.clock()`

z = open('Number.txt','r+')

m = mmap.mmap(z.fileno(), 0)

global a

a = int(m.read())

z.close()

end = time.clock()

secs = (end - start)

print("Number read in","%s" % (secs),"seconds.", file=f)

print("Number read in","%s" % (secs),"seconds.")

f.flush()

del end,start,secs,z,m

Other than splitting the number from one line to various lines; which I'd rather not do, is there a cleaner method which won't require the better part of an hour?

By the way, I don't necessarily have to use text files.

I have: Windows 8.1 64-Bit, 16GB RAM, Python 3.5.1

Answer Source

The file read is quick (<1s):

```
with open('number.txt') as f:
data = f.read()
```

Converting a 30-million-digit string to an integer, that's slow:

```
z=int(data) # still waiting...
```

If you store the number as raw big- or little-endian binary data, then `int.from_bytes(data,'big')`

is much quicker.

If I did my math right:

```
>>> import math
>>> math.log(10)/math.log(2) # Number of bits to represent a base 10 digit.
3.3219280948873626
>>> 30000000/_ # Number of bits to represent 30M-digit #.
9030899.869919434
>>> _/8 # Number of bytes to represent 30M-digit #.
1128862.4837399293 # Only ~1MB so file will be smaller :^)
>>> import os
>>> data=os.urandom(1128863) # Generate some random bytes
>>> z=int.from_bytes(data,'big') # Convert to integer (<1s)
>>> z.bit_length()
9030902
```