Nicolás Medina Nicolás Medina - 3 months ago 29
Python Question

numpy.column_stack with numeric and string arrays

I have several arrays, some of them have float numbers and others have string characters, all the arrays have the same length. When I try to use numpy.column_stack in these arrays, this function convert to string the float numbers, for example:

a = np.array([3.4,3.4,6.4])
b = np.array(['holi','xlo','xlo'])

B = np.column_stack((a,b))

print B
>>> [['3.4' 'holi']
['3.4' 'xlo']
['3.4' 'xlo']

type(B[0,0])
>>> numpy.string


Why? It's possible to avoid it?
Thanks a lot for your time.

Answer

To store such mixed type data, most probably you would be required to store them as Object dtype arrays or use structured arrays. Going with the Object dtype arrays, we could convert either of the input arrays to an Object dtype upfront and then stack it alongside the rest of the arrays to be stacked. The rest of the arrays would be converted automatically to Object dtype to give us a stacked array of that type. Thus, we would have an implementation like so-

np.column_stack((a.astype(np.object),b))

Sample run to show how to construct a stacked array and retrieve the individual arrays back -

In [88]: a
Out[88]: array([ 3.4,  3.4,  6.4])

In [89]: b
Out[89]: 
array(['holi', 'xlo', 'xlo'], 
      dtype='|S4')

In [90]: out = np.column_stack((a.astype(np.object),b))

In [91]: out
Out[91]: 
array([[3.4, 'holi'],
       [3.4, 'xlo'],
       [6.4, 'xlo']], dtype=object)

In [92]: out[:,0].astype(float)
Out[92]: array([ 3.4,  3.4,  6.4])

In [93]: out[:,1].astype(str)
Out[93]: 
array(['holi', 'xlo', 'xlo'], 
      dtype='|S4')