ybb ybb - 2 months ago 11
Python Question

Set values of groups in pandas conditionally python

I have a datafram with the following columns:

duration, cost, channel
2 180 TV1
1 200 TV2
2 300 TV3
1 nan TV1
2 nan TV2
2 nan TV3
2 nan TV1
1 40 TV2
1 nan TV3


Some of the cost values are nans, and to fill them I need to do the following:


  • group by channel

  • within a channel, sum the available cost and divide by the number of * occurrences (average)

  • reassign values for all rows within that channel:

    • if duration = 1, cost = average * 1.5

    • if duration = 2, cost = average




Example:
TV2 channel, we have 3 entries, with one entry having null cost. So I need to do the following:

average = 200+40/3 = 80
if duration = 1, cost = 80 * 1.5 = 120

duration, cost, channel
2 180 TV1
1 120 TV2
2 300 TV3
1 nan TV1
2 80 TV2
2 nan TV3
2 nan TV1
1 120 TV2
1 nan TV3


I know i should do df.groupby('channel') and then apply function to each group.
The problem is that I need to modify not onlu null values, I need to modify all cost values within a group if 1 cost is null.

Any tips help would be appreciated.

Thanks!

Answer

If i understand your problem correctly, you want something like:

def myfunc(group):

    # only modify cost if there are nan's
    if len(group) != group.cost.count():

        # set all cost values to the mean
        group['cost'] = group.cost.sum() / len(group)

        # multiply by 1.5 if the duration equals 1
        group['cost'][group.duration == 1] = group['cost'] * 1.5

    return group


df.groupby('channel').apply(myfunc)

   duration  cost channel
0         2    60     TV1
1         1   120     TV2
2         2   100     TV3
3         1    90     TV1
4         2    80     TV2
5         2   100     TV3
6         2    60     TV1
7         1   120     TV2
8         1   150     TV3
Comments