And - 6 months ago 54

Python Question

What would be the best way to write multiple numpy arrays of different dtype as different columns of a single CSV file?

For instance, given the following arrays:

`array([[1, 2],`

[3, 4],

[5, 6]])

array([[ 10., 20.],

[ 30., 40.],

[ 50., 60.]])

I would like to obtain a file (delimiter irrelevant):

`1 2 10.0 20.0`

3 4 30.0 40.0

5 6 50.0 60.0

Optimally, I would like to be able to write a list of arrays this way, where a format/dtype can be different for every array.

I tried looking at

`savetxt`

Answer

```
In [38]: a=np.arange(1,7).reshape(3,2)
In [39]: b=np.arange(10,70.,10).reshape(3,2)
In [40]: c=np.concatenate((a,b),axis=1)
In [41]: c
Out[41]:
array([[ 1., 2., 10., 20.],
[ 3., 4., 30., 40.],
[ 5., 6., 50., 60.]])
```

All values are float; default `savetxt`

is a general float:

```
In [43]: np.savetxt('test.csv',c)
In [44]: cat test.csv
1.000000000000000000e+00 2.000000000000000000e+00 1.000000000000000000e+01 2.000000000000000000e+01
3.000000000000000000e+00 4.000000000000000000e+00 3.000000000000000000e+01 4.000000000000000000e+01
5.000000000000000000e+00 6.000000000000000000e+00 5.000000000000000000e+01 6.000000000000000000e+01
```

With a custom `fmt`

I can get:

```
In [46]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [47]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
```

More generally we can make a `c`

with a compound dtype. It isn't needed here with just floats and ints, but with strings it would matter. But we still need a long `fmt`

to display the columns correctly.

`np.rec.fromarrays`

is an easy way to generate a structured arrays. Unfortunately it only works with flattened arrays. So for your (3,2) arrays I need to list the columns separately.

```
In [52]: c = np.rec.fromarrays((a[:,0],a[:,1],b[:,0],b[:,1]))
In [53]: c
Out[53]:
rec.array([(1, 2, 10.0, 20.0), (3, 4, 30.0, 40.0), (5, 6, 50.0, 60.0)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<f8')])
In [54]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [55]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
```

I'm using the same `savetxt`

.

I could also make a structured array with 2 fields, each being 2 columns. I'm not sure if `savetxt`

would work with that or not.

`savetxt`

essentially iterates over the 1st dimension of your array, and does a formatted write on each row, roughly:

```
for row in arr:
f.write(fmt%tuple(row))
```

where `fmt`

is derived from your parameter.

It wouldn't be hard to write your own version that iterates on 2 arrays, and does a separate formatted write for each pair of rows.

```
for r1,r2 in zip(a,b):
print('%2d %2d'%tuple(r1), '%5.1f %5.1f'%tuple(r2))
```

===================

Trying a compound dtype

```
In [60]: np.dtype('2i,2f')
Out[60]: dtype([('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [61]: c=np.zeros(a.shape[0], np.dtype('2i,2f'))
In [62]: c['f0']=a
In [63]: c['f1']=b
In [64]: c
Out[64]:
array([([1, 2], [10.0, 20.0]), ([3, 4], [30.0, 40.0]),
([5, 6], [50.0, 60.0])],
dtype=[('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [65]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
---
ValueError: fmt has wrong number of % formats: %2d %2d %5.1f %5.1f
```

So writing a compound dtype like this does not work. Considering that a row of `c`

looks like:

```
In [69]: tuple(c[0])
Out[69]: (array([1, 2], dtype=int32), array([ 10., 20.], dtype=float32))
```

I shouldn't be surprised.

I can save the two blocks with `%s`

format, but that leaves me with brackets.

```
In [66]: np.savetxt('test.csv',c,fmt='%s %s')
In [67]: cat test.csv
[1 2] [ 10. 20.]
[3 4] [ 30. 40.]
[5 6] [ 50. 60.]
```

I think there is a `np.rec`

function that flattens the dtype. But I can also do that with a `view`

:

```
In [72]: np.savetxt('test.csv',c.view('i,i,f,f'),fmt='%2d %2d %5.1f %5.1f')
In [73]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
```

So as long as you are dealing with numeric values, the simple concatenate is just as good as the more complex structured approaches.

============