mortysporty - 3 years ago 241

Python Question

I have a dataframe

`df = pd.DataFrame({'age' : [(1, 2), (1, 3), (1, 1)], \`

'year' : [(20, 30), (30, 40), (30, 40)]})

df

Out[58]:

age year

0 (1, 2) (20, 30)

1 (1, 3) (30, 40)

2 (1, 1) (30, 40)

I want to convert this as a numpy array like this

`array([[ 1, 2, 20, 30],`

[ 1, 3, 30, 40],

[ 1, 1, 30, 40]])

i.e. a row in the dataframe is a row in the matrix, and one tuple column in the dataframe is two columns in the matrix. There could concievably be more tuples in the dataframe (resulting in more columns in the array).

So,if

`col_names`

`col_names = ['age', 'year']`

I want something like

`numpy_array = some_clever_expression(col_names)`

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Stack with `np.concatenate`

to get a 1D flattened array and then reshape -

```
np.concatenate(np.concatenate(df.values)).reshape(df.shape[0],-1)
```

Sample output -

```
In [460]: np.concatenate(np.concatenate(df.values)).reshape(df.shape[0],-1)
Out[460]:
array([[ 1, 2, 20, 30],
[ 1, 3, 30, 40],
[ 1, 1, 30, 40]])
```

Alternatively, we could use `np.hstack`

to get the flattened version -

```
np.hstack(np.hstack(df.values))
```

To select specific columns, simple index into those columns, get the array data and proceed. Thus, for a list of column names in `col_names`

, use `df[col_names].values`

instead.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**