uu3708 uu3708 -4 years ago 100
Python Question

How can I parse multilines in text file using python?

I edit the sample

Here is sample textfile.txt

text line1
text line2
dummy text dummy test dummy test
dummy test dummy test dummy test
text line3
text line4

I want to parse

"text line 1\n text line 2" → array [0]

"text line 3\n text line 4" → array [1]

How should I source coding in python?

Answer Source

Python's groupby() function is good for doing this:

from itertools import groupby

with open('input.txt') as f_input:
    data = [list(g) for k, g in groupby(f_input, lambda x: not x.startswith("-------!@#$-------")) if k]
    data = [''.join(x) for x in data]

print data

Giving you data holding:

['text line1\ntext line2\n', 'text line3\ntext line4\n']

The first list comprehension reads the file grouping lines that do not start with your line separator. This results in data holding:

[['text line1\n', 'text line2\n'], ['text line3\n', 'text line4\n']]

Next, a second list comprehension is used to join back the multiple lines.


data[0] --> 'text line1\ntext line2\n'
data[1] --> 'text line3\ntext line4\n'

To parse out sections containing certain words, the list comprehension could be replaced with this one:

data = [''.join(x) for x in data if 'dummy test' not in ''.join(x)]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download