Giada Giada - 9 months ago 28
Python Question

Python interaction between rows and columns

I have the following program:

df = pd.DataFrame({
'student':['a'] * 3 + ['b'] * 3 + ['c'] * 4,
'semester':[1, 1, 2, 2, 1, 1, 2, 2, 2, 2],
'passed_exam':[True, False] * 5,
'exam': [
'French', 'English', 'Italian', 'Chinese', 'Russian',
'German', 'Chinese', 'Spanish', 'English', 'French'
]
})

print (df)

passed_exam exam semester student
0 True French 1 a
1 False English 1 a
2 True Italian 2 a
3 False Chinese 2 b
4 True Russian 1 b
5 False German 1 b
6 True Chinese 2 c
7 False Spanish 2 c
8 True English 2 c
9 False French 2 c


Does anybody know how to find the number of students that each student interacted with (through exams)?

Something like this:

passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
2 True Italian 2 a 1
3 False Chinese 2 b 1
4 True Russian 1 b 1
5 False German 1 b 1
6 True German 2 c 2
7 False Spanish 2 c 2
8 True English 2 c 2
9 False French 2 c 2


Thank you in advance!

Answer Source

I interpret "number of students that each student interacted with (through exams)" as # of students who were sitting the same exam.

Then, it seems like:

df1 = (df
       .groupby(["exam","semester"], as_index=False)["student"].agg("count")
       .rename(columns={"student":"total_st"}))
df.merge(df1).sort_values(["semester","student"])

    passed_exam exam    semester    student total_st
0   True    French  1   a   1
1   False   English 1   a   1
5   True    Russian 1   b   1
6   False   German  1   b   1
2   True    Italian 2   a   1
3   False   Chinese 2   b   2
4   True    Chinese 2   c   2
7   False   Spanish 2   c   1
8   True    English 2   c   1
9   False   French  2   c   1