Brian Z Brian Z - 1 month ago 11
Python Question

How to only read lines in a text file after a certain string using python?

Using python, I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.

I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this stack overflow answer):

for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;


But how do I tell python to start reading the lines that only come after the string?

Answer

just start another loop when you reach the line you want to start from :

for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:                
                for line in f: # now you are at the lines you want
                    # do work

A file object is it's own iterator, so when we reach the line with Abstract in it we continue our iteration from that line until we have consumed the iterator.

A simple example:

gen  =  (n for n in xrange(8))

for x in gen:
    if x == 3:
        print("starting second loop")
        for x in gen:
            print("In second loop",x)
    else:
        print("In first loop", x)

In first loop 0
In first loop 1
In first loop 2
starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7

You can also use itertools.dropwhile to consume the lines up to the point you want.

from itertools import dropwhile

for files in filepath:
    with open(files, 'r') as f:
        dropped = dropwhile(lambda _line: "Abstract" not in _line, f)
        next(dropped,"")
        for line in dropped:
                print(line)