user2186862 user2186862 - 1 year ago 160
Python Question

pandas read_csv ignore separator in last column

I have a file with the following structure (first row is the header, filename is

test.dat
):

ID_OBS LAT LON ALT TP TO LT_min LT_max STATIONNAME
ALT_NOA_000 82.45 -62.52 210.0 FM 0 0.0 24.0 Alert, Nunavut, Canada


How do I instruct pandas to read the entire station name (in this example, Alert, Nunavut, Canada) as a single element? I use
delim_whitespace=True
in my code, but that does not work, since the station name contains whitespace characters.

Running:

import pandas as pd
test = pd.read_csv('./test.dat', delim_whitespace=True, header=1)
print(test.to_string())


Produces:

ID_OBS LAT LON ALT TP TO LT_min LT_max STATIONNAME
ALT_NOA_000 82.45 -62.52 210.0 FM 0 0.0 24.0 Alert, Nunavut, Canada


Quickly reading through the tutorials did not help. What am I missing here?

Answer Source

I often approach these by writing my own little parser. In general there are ways to bend pandas to your will, but I find this way is often easier:

Code:

import re

def parse_my_file(filename):
    with open(filename) as f:
        for line in f:
            yield re.split(r'\s+', line.strip(), 8)

# build the generator        
my_parser = parse_my_file('test.dat')

# first element returned is the columns
columns = next(my_parser)

# build the data frame
df = pd.DataFrame(my_parser, columns=columns)
print(df)

Results:

        ID_OBS    LAT     LON    ALT  TP TO LT_min LT_max  \
0  ALT_NOA_000  82.45  -62.52  210.0  FM  0    0.0   24.0   

              STATIONNAME  
0  Alert, Nunavut, Canada 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download