Using python, I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this stack overflow answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
just start another loop when you reach the line you want to start from :
for files in filepath: with open(files, 'r') as f: for line in f: if 'Abstract' in line: for line in f: # now you are at the lines you want # do work
A file object is it's own iterator, so when we reach the line with Abstract in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8)) for x in gen: if x == 3: print("starting second loop") for x in gen: print("In second loop",x) else: print("In first loop", x) In first loop 0 In first loop 1 In first loop 2 starting second loop In second loop 4 In second loop 5 In second loop 6 In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want.
from itertools import dropwhile for files in filepath: with open(files, 'r') as f: dropped = dropwhile(lambda _line: "Abstract" not in _line, f) next(dropped,"") for line in dropped: print(line)