pmv pmv - 7 months ago 15
Python Question

Referencing a list from list of lists

I have a 2 dataframes
One- Score Card for scoring student marks
Second One-Student dataset.

I want to apply score card on a given student dataset to compute score and aggregate them. I'm trying to devlop a generic function that takes the
score card and applies on any studentmarks dataset

import pandas as pd
score_card_data = {
'subject_id': ['MATHS', 'SCIENCE', 'ARTS'],
'bin_list': [[0,25,50,75,100], [0,20,40,60,80,100], [0,20,40,60,80,100]],
'bin_value': [[1,2,3,4], [1,2,3,4,5], [3,4,5,6,7] ]}
score_card_data = pd.DataFrame(score_card_data, columns = ['subject_id', 'bin_list', 'bin_value'])
score_card_data

student_scores = {
'STUDENT_ID': ['S1', 'S2', 'S3','S4','S5'],
'MATH_MARKS': [10,15,25,65,75],
'SCIENCE_MARKS': [8,15,20,35,85],
'ARTS_MARKS':[55,90,95,88,99]}
student_scores = pd.DataFrame(student_scores, columns = ['STUDENT_ID', 'MATH_MARKS', 'SCIENCE_MARKS','ARTS_MARKS'])
student_scores


Functions
Define bins
Apply the bins over columns

bins = list(score_card_data.loc[score_card_data['subject_id'] == 'MATHS', 'bin_list'])
student_scores['MATH_SCORE'] = pd.cut(student_scores['MATH_MARKS'],bins, labels='MATHS_MARKS')

Error:ValueError: object too deep for desired array


I'm trying to convert the cellvalue to a string and it is getting detected as an object. Any way to resolve

How can I make the function more generic?

Thanks
Pari

Answer

You can just use bins[0] to extract the list, which otherwise raises the ValueError:

bins[0]
[0, 25, 50, 75, 100]

type(bins[0])
<class 'list'>

student_scores['MATH_SCORE'] = pd.cut(student_scores['MATH_MARKS'], bins[0])

  STUDENT_ID  MATH_MARKS  SCIENCE_MARKS  ARTS_MARKS MATH_SCORE
0         S1          10              8          55    (0, 25]
1         S2          15             15          90    (0, 25]
2         S3          25             20          95    (0, 25]
3         S4          65             35          88   (50, 75]
4         S5          75             85          99   (50, 75]

I left out the labels because you'd need to provide a list of four labels given there are five cutoffs / bin edges.

Comments