davo36 davo36 - 6 months ago 20
Python Question

How do I read part of a file into a DataFrame with Python

I have text files which look like this:

0.289
--------
A B C D E
--------
EBA
E-D
EB-
EED
EBD
EBE
E-D
E-D


Now I want to read the various bits into data structures.

I want to learn how to use dataframes, so I want to open the file read the first value into a float, skip a line, read the next line into a built in list, then skip a line and read the rest into a dataframe.

The file reading routines for dataframes seem to work on the whole file, so not sure how to do this.

You must also be able to add rows to a dataframe, I just don't know how to do it and the documentation is so extensive - which is good - but I'll need to read 50 pages to find the answer...

Edit: I can do it like this, but there must be a nicer way:

alignmentMatrix = []
with open("DataFile.txt", 'r') as f:
theta = f.readline().strip()
f.readline()
alphabet = f.readline().split()
f.readline()
for line in f:
row = list(line.strip())
alignmentMatrix.append(row)
alignmentDF = pandas.DataFrame(alignmentMatrix)


And so I end up with this:

0 1 2
0 E B A
1 E - D
2 E B -
3 E E D
4 E B D
5 E B E
6 E - D
7 E - D


So it's a 2D dataframe.

Answer

You cannot be better than in your example for the first bits. However, you can read the remainder as fixed width file (pandas.read_fwf):

with open("test.txt", 'r') as f:
    theta = float(f.readline().strip())
    f.readline()
    alphabet = f.readline().split()
    f.readline()
    alignmentDF = pandas.read_fwf(f, widths=[1,1,1], header=None)
Comments