Eric Hansen Eric Hansen - 6 months ago 49
Python Question

Finding intersection of values in a column associated with unique values in another column Pandas

If I have a DataFrame like this (very minimal example)

col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 b 4
5 c 1
6 c 2
7 c 3


and I want the intersection of all
col2
values when they are related to their unique
col1
values (so in this case, the intersection would be
[1,2]
), how can I do so with Pandas? Another way to word this would be the values in
col2
that exist for every unique value in
col1
.

My (bad) solution was to get the unique
col1
elements with
unique
, and then build dictionaries from each unique element in
col1
and then take the set intersection of those dictionary values. I feel like there is a mechanism I should be using to relate the columns together however that could make this much easier.

Answer

One way is to use pivot_table:

In [11]: cross = df.pivot_table(index="col1", columns="col2", aggfunc='size') == 1

In [12]: cross
Out[12]:
col2     1     2      3      4
col1
a     True  True  False  False
b     True  True  False   True
c     True  True   True  False

In [13]: cross.all()
Out[13]:
col2
1     True
2     True
3    False
4    False
dtype: bool

In [14]: cross.columns[cross.all()]
Out[14]: Int64Index([1, 2], dtype='int64', name='col2')
Comments