user5779223 user5779223 - 1 month ago 11
Python Question

Fail to transfer the list in data frame to numpy array with python-pandas

For a data frame

df
:

name list1 list2
a [1, 3, 10, 12, 20..] [2, 6, 23, 29...]
b [2, 10, 14, 3] [4, 7, 8, 13...]
c [] [98, 101, 200]
...


I want to transfer the
list1
and
list2
to
np.array
and then
hstack
them. Here is what I did:

df.pv = df.apply(lambda row: np.hstack((np.asarray(row.list1), np.asarray(row.list2))), axis=1)


And I got such an error:

ValueError: Shape of passed values is (138493, 175), indices imply (138493, 4)


Where
138493==len(df)


Please note that some value in
list1
and
list2
is empty list,
[]
. And the length of list are different among rows. Do you know what is the reason how can I fix the problem? Thanks in advance!

EDIT:

When I just try to convert one list to array:

df.apply(lambda row: np.asarray(row.list1), axis=1)


An error also occurs:

ValueError: Empty data passed with indices specified.

Answer

Your apply function is almost correct. All you have to do - convert the output of the np.hstack() function back to a python list.

df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

The code is shown below (including the df creation):

df = pd.DataFrame([('a',[1, 3, 10, 12, 20],[2, 6, 23, 29]),
                   ('b',[2, 10, 1.4, 3],[4, 7, 8, 13]),
                   ('c',[],[98, 101, 200])],
                   columns = ['name','list1','list2'])

df['list3'] = df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

print(df)

Output:

0              [1, 3, 10, 12, 20, 2, 6, 23, 29]
1    [2.0, 10.0, 1.4, 3.0, 4.0, 7.0, 8.0, 13.0]
2                          [98.0, 101.0, 200.0]
Name: list3, dtype: object

If you want a numpy array, the only way I could get it to work is:

df['list3'] = df['list3'].apply(lambda x: np.array(x))

print(type(df['list3'].ix[0]))
Out[] : numpy.ndarray
Comments