Emily T. Emily T. - 4 months ago 8
Python Question

How can I count a word from all lines that are 2 rows after a specific line?

So, this might sound a bit confusing, I'll try to explain it. For example from these lines:

next line 1
^^^^^^^^^^^^^^^^^^
red blue dark ten lemon
next line 2
^^^^^^^^^^^^^^^^^^^
hat 45 no dad fate orange
next line 3
^^^^^^^^^^^^^^^^^^^
tan rat lovely lemon eat
you him lemon Daniel her"


I am only interested in the count of "lemon" from lines that have "next line" two lines above it. So, the output I expect is "2 lemons".

Any help will be greatly appreciated!

My attempt so far is:

#!/usr/bin/env python
#import the numpy library
import numpy as np

lemon = 0

logfile = open('file','r')

for line in logfile:

words = line.split()

words = np.array(words)
if np.any(words == 'next line'):
if np.any(words == 'lemon'):
lemon +=1
print "Total number of lemons is %d" % (lemon)


but this counts "lemon" only if it's on the same line as "next line".

Answer

For each line you need to be able to access to two line before it. For that aim you can use itertools.tee in order to create two independent file object (which are iterator-like objects) then use itertools.izip() in order to create your your expected pairs:

from itertools import tee, izip
with open('file') as logfile:
    spam, logfile = tee(logfile)
    # consume first two line of spam
    next(spam)
    next(spam)
    for pre, line in izip(logfile, spam):
        if 'next line' in pre:
             print line.count('lemon')

Or if you just want to count the lines you can use a generator expression within sum():

from itertools import tee, izip
with open('file') as logfile:
    spam, logfile = tee(logfile)
    # consume first two lines of spam
    next(spam)
    next(spam)
    print sum(line.count('lemon') for pre, line in izip(logfile, spam) if 'next line' in pre)