mpjan mpjan - 5 months ago 10
Python Question

How can I remove extra whitespace from strings when parsing a csv file in Pandas?

I have the following file named 'data.csv':

1997,Ford,E350
1997, Ford , E350
1997,Ford,E350,"Super, luxurious truck"
1997,Ford,E350,"Super ""luxurious"" truck"
1997,Ford,E350," Super luxurious truck "
"1997",Ford,E350
1997,Ford,E350
2000,Mercury,Cougar


And I would like to parse it into a pandas DataFrame so that the DataFrame looks as follows:

Year Make Model Description
0 1997 Ford E350 None
1 1997 Ford E350 None
2 1997 Ford E350 Super, luxurious truck
3 1997 Ford E350 Super "luxurious" truck
4 1997 Ford E350 Super luxurious truck
5 1997 Ford E350 None
6 1997 Ford E350 None
7 2000 Mercury Cougar None


The best I could do was:

pd.read_table("data.csv", sep=r',', names=["Year", "Make", "Model", "Description"])


Which gets me:

Year Make Model Description
0 1997 Ford E350 None
1 1997 Ford E350 None
2 1997 Ford E350 Super, luxurious truck
3 1997 Ford E350 Super "luxurious" truck
4 1997 Ford E350 Super luxurious truck
5 1997 Ford E350 None
6 1997 Ford E350 None
7 2000 Mercury Cougar None


How can I get the DataFrame without those whitespaces?

Answer

You could use converters:

import pandas as pd

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

def make_int(text):
    return int(text.strip('" '))

table = pd.read_table("data.csv", sep=r',',
                      names=["Year", "Make", "Model", "Description"],
                      converters = {'Description' : strip,
                                    'Model' : strip,
                                    'Make' : strip,
                                    'Year' : make_int})
print(table)

yields

   Year     Make   Model              Description
0  1997     Ford    E350                     None
1  1997     Ford    E350                     None
2  1997     Ford    E350   Super, luxurious truck
3  1997     Ford    E350  Super "luxurious" truck
4  1997     Ford    E350    Super luxurious truck
5  1997     Ford    E350                     None
6  1997     Ford    E350                     None
7  2000  Mercury  Cougar                     None
Comments