Trying to create--in Python using multiple data frames--the equivalent of a countifs in Excel that would span multiple sheets.
I need a new column count of records on another data frame based on criteria from the current data frame.
See Excel impression of what I want to do in python, also here.
sum(1 for x in students['Student ID'] if x == 1)
sum(1 for x in exams['Exam Grade'] if x >= 70)
Basically what you'll want to do is set up two dataframes, say
df1 for the "exams passed" information and
df2 for the marks on each exam.
To get yourself started, you can read in your excel files like this:
df1 = pd.read_excel('filename1.xlsx') df2 = pd.read_excel('filename2.xlsx')
Then for each row in
df1 you want to segment
df2 and get the length of the segmented dataframe.
First though you might want to make list of information for each row in df1, which could be done like this:
student_info = df1[['Student ID', 'Enrollment Date', 'Qualification Date']].values
Then you can iterate through the rows like this:
N_exams_passed =  # Store counts for each student in a list for s_id, s_enroll, s_qual in student_info: N_exams_passed.append(len(df2[(df2['Student ID']==s_id) & (df2['Exam Date']>=s_enroll) & (df2['Exam Date']<=s_qual) & (df2['Grade']>=70)]) )
Then add/replace the column in df1:
df1['Exams Passed'] = N_exams_passed
In order to compare the dates properly you will need to convert them to datetime objects in each pandas dataframe, and I will leave this up to you. Hint: you can use the