nciao nciao - 3 months ago 8
Python Question

Append a new column based on existing columns

Pandas newbie here.

I'm trying to create a new column in my data frame that will serve as a training label when I feed this into a classifier.

The value of the label column is 1.0 if a given Id has (Value1 > 0) or (Value2 > 0) for Apples or Pears, and 0.0 otherwise.

My dataframe is row indexed by Id and looks like this:

Out[30]:
Value1 Value2 \
ProductName 7Up Apple Cheetos Onion Pear PopTart 7Up
ProductType Drinks Groceries Snacks Groceries Groceries Snacks Drinks
Id
100 0.0 1.0 2.0 4.0 0.0 0.0 0.0
101 3.0 0.0 0.0 0.0 3.0 0.0 4.0
102 0.0 0.0 0.0 0.0 0.0 2.0 0.0


ProductName Apple Cheetos Onion Pear PopTart
ProductType Groceries Snacks Groceries Groceries Snacks
Id
100 1.0 3.0 3.0 0.0 0.0
101 0.0 0.0 0.0 2.0 0.0
102 0.0 0.0 0.0 0.0 1.0


If the pandas wizards could give me a hand with the syntax for this operation - my mind is struggling to put it all together.

Thanks!

Answer

Define your function:

def new_column (x):
       if x['Value1'] > 0 :
          return '1.0'
       if x['Value2'] > 0 :
          return '1.0'
       return '0.0'

Apply it on your data:

df.apply (lambda x: new_column (x),axis=1)
Comments