Utumbu Utumbu - 1 year ago 82
Python Question

Python: Most optimal way to read file line by line

I have a large input file I need to read from so I don't want to use

for line in fo:
in the traditional way won't work and I'll state why, but I feel some modification to that is what I need right now. Consider the following file:

3 # No of tests that will follow
3 # No of points in current test
1 # 1st x-coordinate
2 # 2nd x-coordinate
3 # 3rd x-coordinate
2 # 1st y-coordinate
4 # 2nd y-coordinate
6 # 3rd y-coordinate

What I need is to be able to read variable chunks of lines, pair the coordinates in tuple, add tuple to a list of cases and move back to reading a new case from the file.

I thought of this:

with open(input_file) as f:
T = int(next(f))
for _ in range(T):
N = int(next(f))
for i in range(N):
for i in range(N):

Then couple the two lists into a tuple. I feel there must be a cleaner way to do this. Any suggestions?

EDIT: The y-coordinates will have to have a separate for loop to read. They are x and y coordinates are n lines apart. So Read line i; Read line (i+n); Repeat n times - for each case.

Answer Source

This might not be the shortest possible solution but I believe it is “pretty optimal”.

def parse_number(stream):
    return int(next(stream).partition('#')[0].strip())

def parse_coords(stream, count):
    return [parse_number(stream) for i in range(count)]

def parse_test(stream):
    count = parse_number(stream)
    return list(zip(parse_coords(stream, count), parse_coords(stream, count)))

def parse_file(stream):
    for i in range(parse_number(stream)):
        yield parse_test(stream)

It will eagerly parse all coordinates of a single test but each test will only be parsed lazily as you ask for it.

You can use it like this to iterate over the tests:

if __name__ == '__main__':
    with open('input.txt') as istr:
        for test in parse_file(istr):

Better function names might be desired to better distinguish eager from lazy functions. I'm experiencing a lack of naming creativity right now.