Alex Savin Alex Savin - 4 months ago 31
Python Question

How to convert .dat to .csv using python?

I have a file.dat which looks like:

id | user_id | venue_id | latitude | longitude | created_at

---------+---------+----------+-----------+-----------+-----------------

984301 |2041916 |5222 | | |2012-04-21 17:39:01

984222 |15824 |5222 |38.8951118 |-77.0363658|2012-04-21 17:43:47

984315 |1764391 |5222 | | |2012-04-21 17:37:18

984234 |44652 |5222 |33.800745 |-84.41052 | 2012-04-21 17:43:43


I need to get csv file with deleted empty latitude and longtitude rows, like:

id,user_id,venue_id,latitude,longitude,created_at

984222,15824,5222,38.8951118,-77.0363658,2012-04-21T17:43:47

984234,44652,5222,33.800745,-84.41052,2012-04-21T17:43:43

984291,105054,5222,45.5234515,-122.6762071,2012-04-21T17:39:22


I try to do that, using next code:

with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('|').split()
newLines.append(newLine)

with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)


But all the same I get a csv file with "|" symbols and empty latitude/longtitude rows.
Where is mistake?
In general I need to use resulting csv-file in DateFrame, so maybe there is some way to reduce number of actions.

Answer

str.strip() removes leading and trailing characters from a string.
You want to split the lines on "|", then strip each element of the resulting list:

with open('file.dat') as input_file:        
    newLines = []
    for line in input_file:
        newLine = [x.strip() for x in line.split('|')]
        if len(newLine) == 6 and newLine[3] and newLine[4]:
            newLines.append(newLine)
Comments