user1828605 user1828605 - 1 month ago 9
Python Question

Displaying unique name with total of column value in a group with additional variables in python

I'm learning Python and thought working on a project might be the best way to learn it. I have about 200,000 rows of data in which the data shows list of medication for the patient. Here's a sample of the data.

PTID PTNAME MME DRNAME DRUGNAME SPLY STR QTY FACTOR
1 PATIENT, A 2700 DR, A OXYCODONE HCL 15 MG 30 15 120 1.5
1 PATIENT, A 2700 DR, B OXYCODONE HCL 15 MG 30 15 120 1.5
2 PATIENT, B 4050 DR, C MORPHINE SULFATE ER 15 MG 30 15 270 1
2 PATIENT, B 4050 DR, C MORPHINE SULFATE ER 15 MG 30 15 270 1
2 PATIENT, B 840 DR, A OXYCODONE-ACETAMINOPHE 10MG-32 14 10 56 1.5
2 PATIENT, B 1350 DR, C OXYCODONE-ACETAMINOPHE 5 MG-32 15 5 180 1.5
3 PATIENT, C 1350 DR, C OXYCODONE-ACETAMINOPHE 5 MG-32 15 5 180 1.5
3 PATIENT, C 1800 DR, D OXYCODONE-ACETAMINOPHE 10MG-32 30 10 120 1.5


I've been thinking about this a lot and have tried many ways but none of the code produce any results or makes any sense. Honestly, I don't even know where to begin. A little help would be highly appreciated.

So, what I want to do is consolidate the data for each patients and calculate the
Total MME
for each patient. The
DRUGNAME
should show the one that has higher MME. In other words, the dataframe should only have one row for each patient.

One thing I did try is

groupby_ptname = semp.groupby('PTNAME').apply(lambda x: x.MME.sum())


which shows unique patient names with total MME, but I'm not sure how to add other variables in this new dataframe.

Answer

IIUC you can do it this way:

In [62]: df.sort_values('MME').groupby('PTNAME').agg({'MME':'sum', 'DRUGNAME':'last'})
Out[62]:
                                  DRUGNAME    MME
PTNAME
PATIENT, A             OXYCODONE HCL 15 MG   5400
PATIENT, B       MORPHINE SULFATE ER 15 MG  10290
PATIENT, C  OXYCODONE-ACETAMINOPHE 10MG-32   3150

or with .reset_index():

In [64]: df.sort_values('MME').groupby('PTNAME').agg({'MME':'sum', 'DRUGNAME':'last'}).reset_index()
Out[64]:
       PTNAME                        DRUGNAME    MME
0  PATIENT, A             OXYCODONE HCL 15 MG   5400
1  PATIENT, B       MORPHINE SULFATE ER 15 MG  10290
2  PATIENT, C  OXYCODONE-ACETAMINOPHE 10MG-32   3150

UPDATE: more fun with agg() function

In [84]: agg_funcs = {
    ...:     'MME':{'MME_max':'last',
    ...:            'MME_total':'sum'},
    ...:     'DRUGNAME':{'DRUGNAME_max_MME':'last'}
    ...: }
    ...:
    ...: rslt = (df.sort_values('MME')
    ...:          .groupby('PTNAME')
    ...:          .agg(agg_funcs)
    ...:          .reset_index()
    ...: )
    ...: rslt.columns = [tup[1] if tup[1] else tup[0] for tup in rslt.columns]
    ...:

In [85]: rslt
Out[85]:
       PTNAME  MME_total  MME_max                DRUGNAME_max_MME
0  PATIENT, A       5400     2700             OXYCODONE HCL 15 MG
1  PATIENT, B      10290     4050       MORPHINE SULFATE ER 15 MG
2  PATIENT, C       3150     1800  OXYCODONE-ACETAMINOPHE 10MG-32