Emmanuel Sunil Emmanuel Sunil - 9 months ago 58
Python Question

Auto convert strings and float columns using genfromtxt from numpy/python

I have several different data files that I need to import using genfromtxt. Each data file has different content. For example, file 1 may have all floats, file 2 may have all strings, and file 3 may have a combination of floats and strings etc. Also the number of columns vary from file to file, and since there are hundreds of files, I don't know which columns are floats and strings in each file. However, all the entries in each column are the same data type.

Is there a way to set up a converter for genfromtxt that will detect the type of data in each column and convert it to the right data type?



If you're able to use the Pandas library, pandas.read_csv is much more generally useful than np.genfromtxt, and will automatically handle the kind of type inference mentioned in your question. The result will be a dataframe, but you can get out a numpy array in one of several ways. e.g.

import pandas as pd
data = pd.read_csv(filename)

# get a numpy array; this will be an object array if data has mixed/incompatible types
arr = data.values

# get a record array; this is how numpy handles mixed types in a single array
arr = data.to_records()

pd.read_csv has dozens of options for various forms of text inputs; see more in the pandas.read_csv documentation.