Dariush Dariush - 1 year ago 59
Python Question

how to convert string array of mixed data types

Let's say I have read and loaded a file into a 2D matrix of mixed data as strings(an example has been provided below)

# an example row of the matrix
['529997' '46623448' '2122110124' '2310' '2054' '2' '66' '' '2010/11/03-12:42:08' '26' 'CLEARING' '781' '30' '3' '0' '0' '1']

I want to convert this chunk of data into their data types to be able to do statistical analysis on it with numpy and scipy.

The datatype for all of the columns is integer except the 8th index this is DateTime and the 10th index is pure string.


What is the easiest way to this conversation?


Performance is very important than readability, I have to convert 4.5m rows of data and then process them!

Answer Source

I like clear code like this:

from datetime import datetime

input_row = ['529997', '46623448', '2122110124', '2310', '2054',
             '2', '66', '', '2010/11/03-12:42:08', '26',
             'CLEARING', '781', '30', '3', '0', '0', '1']

_date = lambda x: datetime.strptime(x, "%Y/%m/%d-%H:%M:%S")
# only necessary because '' should be treated as 0
_int  = lambda x: int('0' + x)

# specify the type parsers for each column
parsers = 8 * [_int] + [_date, _int, str] + 6 * [_int]

output_row = [parse(input) for parse, input in zip(parsers, input_row)]

Depending on your needs, use an iterator instead of a list. This could greatly reduce the amount of memory you need.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download