bfletch bfletch - 1 year ago 53
Python Question

How to speed up printing line by line to a file in Python?

Say I'm printing numbers from two arrays into a file:

from numpy import random
number_of_points = 10000
a = random.rand(number_of_points)
b = random.rand(number_of_points)
fh = open('file.txt', 'w')
for i in range(number_of_points):
for j in range(number_of_points):
print('%f %f' % (a[i], b[j]), file=fh)

I feel this is making lots of calls to the system to print, whereas sending one call containing this information would be faster. Is this correct? If so, how could I do this? Are there faster ways to implement this?

Answer Source

print has a lot of bells and whistles you're not using, and you're using C-style looping with indexing instead of direct iteration, both of which add needless overhead. You might be able to speed it up a bit by limiting the Python level work, pushing it to the C layer.

For example, in this case, you could replace the whole doubly-nested loop structure with:

import itertools

# You could use '%f %f\n'.__mod__ as the map function if you like, I just
# find the modern format strings a little nicer
fh.writelines(itertools.starmap('{} {}\n'.format, itertools.product(a, b)))

which uses product to produce the results of your nested loops and indexing directly, starmap+str.format to create the lines, and fh.writelines to exhaust the generator created by starmap, writing all of its outputs directly to the file with a single function call, instead of 100,000,000 calls to to print.

Aside from the fixed (unrelated to number of items iterated) setup cost to create the generators and pass the final generator to fh.writelines, the actual iteration, formatting and I/O work will take place entirely at the C layer on the CPython reference interpreter, so it should run quite fast.