Hi i am stata user and now iam trying to pass my codes in stata to python/pandas. In this case i want to create a new variables
size
id year entry cohort jobs
1 2009 0 NaN 3
1 2012 1 2012 3
1 2013 0 2012 4
1 2014 0 2012 11
2 2010 1 2010 11
2 2011 0 2010 12
2 2012 0 2010 13
3 2007 0 NaN 38
3 2008 0 NaN 58
3 2012 1 2012 58
3 2013 0 2012 70
4 2007 0 NaN 231
4 2008 0 NaN 241
df['size'] = np.where((1 <= df['jobs'] <= 9),'Micro',np.where((10 <= df['jobs'] <= 49),'Small'),np.where((50 <= df['jobs'] <= 200),'Median'),np.where((200 <= df['empleo']),'Big','NaN'))
What you are trying to do is called binning use pd.cut
i.e
df['new'] = pd.cut(df['jobs'],bins=[1,10,50,201,np.inf],labels=['micro','small','medium','big'])
Output:
id year entry cohort jobs new
0 1 2009 0 NaN 3 micro
1 1 2012 1 2012.0 3 micro
2 1 2013 0 2012.0 4 micro
3 1 2014 0 2012.0 11 small
4 2 2010 1 2010.0 11 small
5 2 2011 0 2010.0 12 small
6 2 2012 0 2010.0 13 small
7 3 2007 0 NaN 38 small
8 3 2008 0 NaN 58 medium
9 3 2012 1 2012.0 58 medium
10 3 2013 0 2012.0 70 medium
11 4 2007 0 NaN 231 big
12 4 2008 0 NaN 241 big
For multiple conditions you have to go for np.select
not np.where
. Hope that helps.
numpy.select(condlist, choicelist, default=0)
Where condlist is the list of your condtions, and choicelist is the list of choices if condition is met. default = 0, here you can put that as np.nan