Dreamer Dreamer - 24 days ago 6
Python Question

Separate multiple values in a row resulting in one value in one row

I have a data frame df1 with a column name "Actors". For example:

Actors
Mel Blanc*Arthur Q. Bryan
Kimberly J. Brown*Daniel Roebuck
Kazunari Aizawa*Aki Morita*Teruhiko Nobukuni
Mel Blanc
Aki Morita


As we can see above, there are rows where multiple actors separated by *. Also, there are also few redundant actors for example "Mel Blanc" and "Aki Morita" in above case.

I want to create a new data frame df2 which has only one actor in each rows, and removes the duplicate values. So result should be -

Actors
Mel Blanc
Arthur Q. Bryan
Kimberly J. Brown
Daniel Roebuck
Kazunari Aizawa
Aki Morita
Teruhiko Nobukuni


How do I perform this task using pandas?

Answer

try this:

In [76]: df.Actors.str.split('*', expand=True).stack().reset_index(level=[0,1], drop=1).drop_duplicates()
Out[76]:
0            Mel Blanc
1      Arthur Q. Bryan
2    Kimberly J. Brown
3       Daniel Roebuck
4      Kazunari Aizawa
5           Aki Morita
6    Teruhiko Nobukuni
dtype: object