zemekeneng - 2 months ago 17x

Python Question

I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):

`>>> import pandas as pd`

>>> import numpy as np

>>> a = np.zeros((5, 2))

>>> a

array([[ 0., 0.],

[ 0., 0.],

[ 0., 0.],

[ 0., 0.],

[ 0., 0.]])

But this throws an error:

`>>> s = pd.Series(a)`

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__

raise_cast_failure=True)

File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array

raise Exception('Data must be 1-dimensional')

Exception: Data must be 1-dimensional

It is possible with a hack:

`>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])`

>>> s

0 [0.0, 0.0]

1 [0.0, 0.0]

2 [0.0, 0.0]

3 [0.0, 0.0]

4 [0.0, 0.0]

Is there a better way?

Answer

Well, you can use the `numpy.ndarray.tolist`

function, like so:

```
>>> a = np.zeros((5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
dtype: object
```

EDIT:

A faster way to accomplish a similar result is to simply do `pd.Series(list(a))`

. This will make a Series of numpy arrays instead of Python lists, so should be faster than `a.tolist`

which returns a list of Python lists.

Source (Stackoverflow)

Comments