Ron Maor Ron Maor - 29 days ago 19
Python Question

Importing .DAT file into pandas dataframe

I have a .DAT file with rows like these:

2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4
.......


the data format is:

year month day hour minute(HST) wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure


I want to import the .DAT file into a pandas dataframe, with the year-month-day-hour-minute as a single index column, and the rest of the values as separate columns.

Any suggestions?

Thanks !!

Answer

You can use read_csv:

import pandas as pd
import numpy as np
from pandas.compat import StringIO
import datetime as dt

temp=u"""2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4"""
#after testing replace StringIO(temp) to filename

parser = lambda date: pd.datetime.strptime(date, '%Y %m %d %H %M')
df = pd.read_csv(StringIO(temp), 
                 sep="\s+", #separator whitespace
                 index_col=0, #convert first column to datetimeindex
                 date_parser=parser, #function for converting dates
                 parse_dates=[[0,1,2,3,4]], #columns to datetime
                 header=None) #none header

Then need set column names, because if use parameter names get:

NotImplementedError: file structure not yet supported

df.columns = ['wind_speed(kts)', 'wind_direction(dec)', 'temperature(C)', 'relative_humidity(%)', 'pressure'] 
#remove index name
df.index.name = None 
print (df)
                     wind_speed(kts)  wind_direction(dec)  temperature(C)  \
2016-01-01 00:00:00               19                  348            2.05   
2016-01-01 00:01:00               19                  351            2.05   
2016-01-01 00:02:00               18                    0            2.05   
2016-01-01 00:03:00               17                  353            2.05   
2016-01-01 00:04:00               19                  346            2.02   
2016-01-01 00:05:00               20                  345            2.00   
2016-01-01 00:06:00               22                  348            1.97   

                     relative_humidity(%)  pressure  
2016-01-01 00:00:00                     7     618.4  
2016-01-01 00:01:00                     7     618.4  
2016-01-01 00:02:00                     7     618.4  
2016-01-01 00:03:00                     7     618.4  
2016-01-01 00:04:00                     7     618.4  
2016-01-01 00:05:00                     7     618.4  
2016-01-01 00:06:00                     7     618.4  

print (df.dtypes)
wind_speed(kts)           int64
wind_direction(dec)       int64
temperature(C)          float64
relative_humidity(%)      int64
pressure                float64
dtype: object

print (df.index)
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:01:00',
               '2016-01-01 00:02:00', '2016-01-01 00:03:00',
               '2016-01-01 00:04:00', '2016-01-01 00:05:00',
               '2016-01-01 00:06:00'],
              dtype='datetime64[ns]', freq=None)
Comments