James Eaves - 4 months ago 5

Python Question

I have a pandas dataframe where one column is a list of all courses taken by a student. The index is the student's ID.

I'd like to find the most common set of courses across all students. For instance, if the dataframe looks like this:

`ID | Courses`

1 [A, C]

2 [A, C]

3 [A, C]

4 [B, C]

5 [B, C]

6 [K, D]

...

Then I'd like the output to return the most common sets and their frequency, something like:

`{[A,C]: 3, [B,C]: 2}`

Answer

```
import pandas as pd
# create example data
a = range(6)
b = [['A', 'C'], ['A', 'C'], ['A', 'C'], ['B', 'C'], ['B', 'C'], ['K', 'D']]
df = pd.DataFrame({'ID': a, 'Courses': b})
# convert lists in Courses-column to tuples (which some parts of pandas need)
df['Courses'] = df['Courses'].apply(lambda x: tuple(x))
print(df.Courses.value_counts())
```

Output:

```
(A, C) 3
(B, C) 2
(K, D) 1
Name: Courses, dtype: int64
```

**Edit (as my answer was accepted):**

jezrael describes (first as a comment to my answer) a much more compact version of the same approach:

```
a = range(6)
b = [['A', 'C'], ['A', 'C'], ['A', 'C'], ['B', 'C'], ['B', 'C'], ['K', 'D']]
df = pd.DataFrame({'ID': a, 'Courses': b})
print(df.Courses.value_counts()) # list->tuple and counting in one line!
```

Source (Stackoverflow)

Comments