piRSquared piRSquared - 1 month ago 10
Python Question

How do I fill null values while using pd.Series.__add__

Consider the two series

s1
and
s2


s1 = pd.Series([1, 2], name='A')
s2 = pd.Series([1], name='A')


When I add them

s1 + s2

0 2.0
1 NaN
Name: A, dtype: float64


I get
NaN
for index
1


Instead I could do

s1.add(s2, fill_value=0)

0 2.0
1 2.0
Name: A, dtype: float64


However, I can't use
add
, I need to use
__add__
. The problem is
pd.Series.__add__
doesn't have a
fill_value
parameter.




Context

So you understand why
__add__
is important to me

What I'm trying to do is subclass
pd.Series
and I want to add two members of my subclass with a plus sign
+
and have the default be to fill the missing values with
0
. In order to use the
+
I have to define
__add__
in my subclass. But I'd like to be able to leverage
pd.Series.__add__
and pass the appropriate parameter and value. But as I've said,
pd.Series.__add__
doesn't have the
fill_value
parameter.

In contrast
pd.DataFrame.__add__
does have
fill_value





What I've Tried

There is a
na_op
parameter that I suspect I can pass something to. But I have no idea what.

s1.__add__(s2, na_op=0)

0 2.0
1 NaN
Name: A, dtype: float64


This is not what I want. To be clear, I need to use
s1.__add__(s2, **kwargs)
where
kwargs
contains a keyword argument that will get me

0 2.0
1 2.0
Name: A, dtype: float64





This is the subclass code I've put together. Hopefully it helps highlight what I'm trying to do.

import pandas as pd, numpy as np

class SubDataFrame(pd.Series):

_metadata = ['date']

@property
def _constructor(self):
return SubDataFrame

def __init__(self, *args, **kwargs):
self.date = pd.to_datetime(kwargs.pop('date', pd.datetime.now().date()))
super().__init__(*args, **kwargs)

def __add__(self, other, *args, **kwargs):
# kwargs.setdefault('fill_value', 0);
return super().__add__(other, *args, **kwargs)

Answer Source

Not 100% sure I follow, but are you just trying to override the __add__ method? Excluding the other stuff you need to do to properly subclass pandas objects, roughly:

class PiR2Series(pd.Series):

    def __init__(self, *args, **kwargs):
        super(PiR2Series, self).__init__(*args, **kwargs)


    def __add__(self, other):
        return self.add(other, fill_value=0)

Then you can do:

s1 = PiR2Series([1, 2], name='A')
s2 = PiR2Series([1], name='A')

s1 + s2

0    2.0
1    2.0
Name: A, dtype: float64

Disclaimer: I haven't really done much subclassing of pandas objects, so I can't guarantee that the above is the proper thing to do.