Michael Hooreman - 1 year ago 203
Python Question

# Python pandas: flatten with arrays in column

I have a pandas Data Frame having one column containing arrays. I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays.

I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow.

Is there a way to do this in pandas/numpy? In other words, I try to improve the flatten function in the example below.

Thanks a lot.

``````toConvert = pd.DataFrame({
'x': [1, 2],
'y': [10, 20],
'z': [(101, 102, 103), (201, 202)]
})

def flatten(df):
tmp = []
def backend(r):
x = r['x']
y = r['y']
zz = r['z']
for z in zz:
tmp.append({'x': x, 'y': y, 'z': z})
df.apply(backend, axis=1)
return pd.DataFrame(tmp)

print(flatten(toConvert).to_string(index=False))
``````

Which gives:

``````x   y    z
1  10  101
1  10  102
1  10  103
2  20  201
2  20  202
``````

Here's a NumPy based solution -

``````np.column_stack((toConvert[['x','y']].values.\
repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
``````

Sample run -

``````In [78]: toConvert
Out[78]:
x   y                z
0  1  10  (101, 102, 103)
1  2  20       (201, 202)

In [79]: np.column_stack((toConvert[['x','y']].values.\
...:      repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]:
array([[  1,  10, 101],
[  1,  10, 102],
[  1,  10, 103],
[  2,  20, 201],
[  2,  20, 202]])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download