Colton T Colton T - 1 month ago 12
Python Question

Read in only rows in between certain strings Python

So I have a text file that I am trying to read with csv in python, however I only want the rows in between two rows that start with certain strings. I have no problems with just reading the data, I have:

import csv
with open('path to file','r') as inf:
reader = csv.reader(inf, delimiter=" ")


and to get all the data I can just loop through and append to a list:

raw_data=[]
for row in reader:
raw_data.append(row)


I know I can get the rows I want by doing something like:

for row in raw_data:
if row[0] == 'string1':
begin_idx = raw_data.index(row)
elif row[0] == 'string2':
end_idx = raw_data.index(row)
data=[]
for idx in range(begin_idx+1,end_idx):
data.append(raw_data[idx])


However, I was hoping to be able to do this all at once when I first loop through the text file, so if anyone has any ideas on how this could be done it would appreciated.

Note, the reason I am not just looking for index of the rows I want is because they are just a list of integers that will change each time I run this. The pdf to text conversion I run isn't extremely clean, so the row titles don't line up with the actual data for the row.

Answer

Iterator objects are nice in that they are just calling next() on the object like reader when using in So this will allow you to go through this in one linear pass by looping through separately when you hit the starting string. Try this:

import csv
with open('path to file','r') as inf:
    reader = csv.reader(inf, delimiter=" ")

data=[]
for row in reader:
    if row[0] == 'string1':
        for row in reader:
            if row[0]=='string2':
                break
            data.append(row)