Michael Hooreman - 9 months ago 47

Python Question

I have a pandas Data Frame having one column containing arrays. I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays.

I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow.

Is there a way to do this in pandas/numpy? In other words, I try to improve the flatten function in the example below.

Thanks a lot.

`toConvert = pd.DataFrame({`

'x': [1, 2],

'y': [10, 20],

'z': [(101, 102, 103), (201, 202)]

})

def flatten(df):

tmp = []

def backend(r):

x = r['x']

y = r['y']

zz = r['z']

for z in zz:

tmp.append({'x': x, 'y': y, 'z': z})

df.apply(backend, axis=1)

return pd.DataFrame(tmp)

print(flatten(toConvert).to_string(index=False))

Which gives:

`x y z`

1 10 101

1 10 102

1 10 103

2 20 201

2 20 202

Answer Source

Here's a NumPy based solution -

```
np.column_stack((toConvert[['x','y']].values.\
repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
```

Sample run -

```
In [78]: toConvert
Out[78]:
x y z
0 1 10 (101, 102, 103)
1 2 20 (201, 202)
In [79]: np.column_stack((toConvert[['x','y']].values.\
...: repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]:
array([[ 1, 10, 101],
[ 1, 10, 102],
[ 1, 10, 103],
[ 2, 20, 201],
[ 2, 20, 202]])
```