bhjghjh bhjghjh - 4 months ago 171
Python Question

reading and doing calculation from .dat file in python

I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I load that .dat file, do I need to delete all the other unwanted columns? If not, how do I selectively declare the column and ask python to do the math?

an example of the .dat file would be

I am new to python , so a little instruction to open , read and calculation would be appreciated.

I have added the code I am using as a starter from your suggestion:

from sys import argv

import pandas as pd

script, filename = argv

txt = open(filename)

print "Here's your file %r:" % filename

def your_func(row):
return row['x-momentum'] / row['mass']

columns_to_keep = ['mass', 'x-momentum']
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

and also the error I get through it:

Traceback (most recent call last):
File "", line 18, in <module>
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/", line 529, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/", line 295, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/", line 612, in __init__
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/", line 747, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/", line 1119, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)
ValueError: No columns to parse from file


After looking at your flash.dat file, it's clear you need to do a little clean up before you process it. The following code converts it to a CSV file:

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]

# write it as a new CSV file
with open("./flash.csv", "wb") as f:
    writer = csv.writer(f)

Now, use Pandas to compute new column.

import pandas as pd

def your_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe