Franklin Harvey Franklin Harvey - 1 month ago 13
Python Question

Converting mixed-format .DAT to .CSV (or anything else)

I have a large collection of DAT files that need to be converted (eventually to a unique file type). The DAT's have a mixed amount of whitespace between fields, and the column headers are on different lines. Any advice?

ALT_RAD
ALT_RAD2
DIRECT D_GLOBAL U_GLOBAL Zenith
Year Mn Dy Hr Mi DIFFUSE2 D_IR U_IR
2004 9 1 0 1 1.04 79.40 78.67 303.58 61.06 310.95 85.142
2004 9 1 0 2 0.71 74.36 73.91 303.80 57.82 310.92 85.171
2004 9 1 0 3 0.67 71.80 71.64 304.25 56.84 310.98 85.199
2004 9 1 0 4 0.75 74.35 74.83 304.21 59.68 310.89 85.227


I have a basic script:

import sys
with open(sys.argv[1], r) as input_file:
newLines = []
for line in input_file:
newLines.append(newLine)


Which I will certainly change to account for mixed whitespace, but I don't know how to work with the wonky column headers.

Eventually I want my headers to just be:

Year Month Day Hour Minute Direct Diffuse2 D_Global D_IR U_Global U_IR Zenith

Answer

Treat those header lines in the input file with all the disdain they deserve. (Or, in other words, read them and discard them.)

headers='Year Month Day Hour Minute Direct Diffuse2 D_Global D_IR U_Global U_IR Zenith'
with open ( 'temp.dat') as input_file:
    with open ('temp_2.csv', 'w') as output_file:
        output_file.write('"%s"\n'%'","'.join(headers.split()))
        for count, line in enumerate(input_file):
            if count<4: continue
            outLine = ','.join(line.split())
            output_file.write(outLine + '\n')