Noobie - 1 year ago 233
Python Question

# Pandas: how to get the unique values of a column that contains a list of values?

Consider the following dataframe

``````df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
'col' : ['A','B','A','B','A','B']})
df.sort_values(by='col',inplace=True)

df
Out[62]:
col                   name
0   A  [one two, three four]
2   A                     []
4   A              [one two]
1   B                  [one]
3   B                     []
5   B                [three]
``````

I would like to get a column that keeps track of all the unique strings included in
`name`
for each combination of
`col`
.

That is, the expected output is

``````df
Out[62]:
col                   name    unique_list
0   A  [one two, three four]    [one two, three four]
2   A                     []    [one two, three four]
4   A              [one two]    [one two, three four]
1   B                  [one]    [one, three]
3   B                     []    [one, three]
5   B                [three]    [one, three]
``````

Indeed, say for group A, you can see that the unique set of strings included in
`[one two, three four]`
,
`[]`
and
`[one two]`
is
`[one two]`

I can obtain the corresponding number of unique values using Pandas : how to get the unique number of values in cells when cells contain lists? :

``````df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))

df
Out[65]:
col                   name count_unique
0   A  [one two, three four]            2
2   A                     []            2
4   A              [one two]            2
1   B                  [one]            2
3   B                     []            2
5   B                [three]            2
``````

but replacing
`nunique`
with
`unique`
above fails.

Any ideas?
Thanks!

``````df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique))