Mihir Shanvir -4 years ago 106

Python Question

I have a DataFrame with 3 columns (A,B,C) and a large number of rows. There are different types of elements in each of these columns: A1,A2... B1,B2... and C1, C2... respectively.

I want to find the number of times a particular combination (say (A1,B2,C2)) occurs in a row. Then I want to generate a (summary) list of all combinations whose frequency-counts exceed a fixed threshold. Example: combinations number of counts (A1,B2,C2) 5 (A2,B2,C2) 7 .... if the fixed value is 6.

I am new to pandas and numpy. Can this be done efficiently using pandas, and if so how?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

```
df = pd.DataFrame({'A':['A1','A1','A2','A3'],
'B':[4,4,6,4],
'C':[7,7,9,7]})
print (df)
A B C
0 4 7 C1
1 4 7 C1
2 6 9 C2
3 4 7 C3
```

For count of all combination use `groupby`

+ `size`

:

```
s = df.groupby(["A", "B","C"]).size()
print (s)
A B C
A1 4 7 2
A2 6 9 1
A3 4 7 1
dtype: int64
```

For filtered list by value add `boolean indexing`

:

```
L = s.index[s > 1].tolist()
print (L)
[('A1', 4, 7)]
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**

Latest added