And And - 3 months ago 37
Python Question

Write multiple Numpy arrays of different dtype as columns of CSV file

What would be the best way to write multiple numpy arrays of different dtype as different columns of a single CSV file?

For instance, given the following arrays:

array([[1, 2],
[3, 4],
[5, 6]])

array([[ 10., 20.],
[ 30., 40.],
[ 50., 60.]])


I would like to obtain a file (delimiter irrelevant):

1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0


Optimally, I would like to be able to write a list of arrays this way, where a format/dtype can be different for every array.

I tried looking at
savetxt
, but it's not clear to me how to use it if the arrays have a different type.

Answer
In [38]: a=np.arange(1,7).reshape(3,2)
In [39]: b=np.arange(10,70.,10).reshape(3,2)
In [40]: c=np.concatenate((a,b),axis=1)
In [41]: c
Out[41]: 
array([[  1.,   2.,  10.,  20.],
       [  3.,   4.,  30.,  40.],
       [  5.,   6.,  50.,  60.]])

All values are float; default savetxt is a general float:

In [43]: np.savetxt('test.csv',c)
In [44]: cat test.csv
1.000000000000000000e+00 2.000000000000000000e+00 1.000000000000000000e+01 2.000000000000000000e+01
3.000000000000000000e+00 4.000000000000000000e+00 3.000000000000000000e+01 4.000000000000000000e+01
5.000000000000000000e+00 6.000000000000000000e+00 5.000000000000000000e+01 6.000000000000000000e+01

With a custom fmt I can get:

In [46]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [47]: cat test.csv
 1  2  10.0  20.0
 3  4  30.0  40.0
 5  6  50.0  60.0

More generally we can make a c with a compound dtype. It isn't needed here with just floats and ints, but with strings it would matter. But we still need a long fmt to display the columns correctly.

np.rec.fromarrays is an easy way to generate a structured arrays. Unfortunately it only works with flattened arrays. So for your (3,2) arrays I need to list the columns separately.

In [52]: c = np.rec.fromarrays((a[:,0],a[:,1],b[:,0],b[:,1]))
In [53]: c
Out[53]: 
rec.array([(1, 2, 10.0, 20.0), (3, 4, 30.0, 40.0), (5, 6, 50.0, 60.0)], 
          dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<f8')])
In [54]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [55]: cat test.csv
 1  2  10.0  20.0
 3  4  30.0  40.0
 5  6  50.0  60.0

I'm using the same savetxt.

I could also make a structured array with 2 fields, each being 2 columns. I'm not sure if savetxt would work with that or not.

savetxt essentially iterates over the 1st dimension of your array, and does a formatted write on each row, roughly:

for row in arr:
    f.write(fmt%tuple(row))

where fmt is derived from your parameter.

It wouldn't be hard to write your own version that iterates on 2 arrays, and does a separate formatted write for each pair of rows.

for r1,r2 in zip(a,b):
    print('%2d %2d'%tuple(r1), '%5.1f %5.1f'%tuple(r2))

===================

Trying a compound dtype

In [60]: np.dtype('2i,2f')
Out[60]: dtype([('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [61]: c=np.zeros(a.shape[0], np.dtype('2i,2f'))
In [62]: c['f0']=a
In [63]: c['f1']=b
In [64]: c
Out[64]: 
array([([1, 2], [10.0, 20.0]), ([3, 4], [30.0, 40.0]),
       ([5, 6], [50.0, 60.0])], 
      dtype=[('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [65]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
---
ValueError: fmt has wrong number of % formats:  %2d %2d %5.1f %5.1f

So writing a compound dtype like this does not work. Considering that a row of c looks like:

In [69]: tuple(c[0]) 
Out[69]: (array([1, 2], dtype=int32), array([ 10.,  20.], dtype=float32))

I shouldn't be surprised.

I can save the two blocks with %s format, but that leaves me with brackets.

In [66]: np.savetxt('test.csv',c,fmt='%s %s')
In [67]: cat test.csv
[1 2] [ 10.  20.]
[3 4] [ 30.  40.]
[5 6] [ 50.  60.]

I think there is a np.rec function that flattens the dtype. But I can also do that with a view:

In [72]: np.savetxt('test.csv',c.view('i,i,f,f'),fmt='%2d %2d %5.1f %5.1f')
In [73]: cat test.csv
 1  2  10.0  20.0
 3  4  30.0  40.0
 5  6  50.0  60.0

So as long as you are dealing with numeric values, the simple concatenate is just as good as the more complex structured approaches.

============

Comments