Rakesh Adk7 Rakesh Adk7 - 1 month ago 8
Python Question

How to read a textfile, that is delimited by whitespaces, into a DataFrame?

I have a text file that is formatted this way:

A00 0010 00000
A001 0011 00000
A00911 0019 00000
A0100 0020 10000


I want to read this file into a DataFrame. So I tried:

import pandas as pd
path = *file path*
df = pd.read_csv(path, sep = '\t', header = None)


What I got was a DataFrame with 4 rows and one column.

0
0 A00 0010 00000
1 A001 0011 00000
2 A00911 0019 00000
3 A0100 0020 10000

[4 rows x 1 columns]


This is because the values are not seperated by "\t". The number of spaces between the columns vary in each row depending on the length of the string.

The desired DataFrame should have four rows and three columns.

0 1 2
0 A000 0010 00000
1 A001 0011 00000
2 A009 0019 00000
3 A0100 0020 10000

[4 rows x 3 columns]

Answer

You could supply delim_whitespace=True along with dtype=str to preserve the dtypes args in read_csv, like:

df = pd.read_csv(path, delim_whitespace=True, header=None, dtype=str)
df

Image