Python Question

Auto convert strings and float columns using genfromtxt from numpy/python

I have several different data files that I need to import using genfromtxt. Each data file has different content. For example, file 1 may have all floats, file 2 may have all strings, and file 3 may have a combination of floats and strings etc. Also the number of columns vary from file to file, and since there are hundreds of files, I don't know which columns are floats and strings in each file. However, all the entries in each column are the same data type.

Is there a way to set up a converter for genfromtxt that will detect the type of data in each column and convert it to the right data type?


Answer Source

If you're able to use the Pandas library, pandas.read_csv is much more generally useful than np.genfromtxt, and will automatically handle the kind of type inference mentioned in your question. The result will be a dataframe, but you can get out a numpy array in one of several ways. e.g.

import pandas as pd
data = pd.read_csv(filename)

# get a numpy array; this will be an object array if data has mixed/incompatible types
arr = data.values

# get a record array; this is how numpy handles mixed types in a single array
arr = data.to_records()

pd.read_csv has dozens of options for various forms of text inputs; see more in the pandas.read_csv documentation.