sanguineturtle sanguineturtle - 2 months ago 28
Python Question

Pandas DataFrame Object Inheritance or Object Use?

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

I am currently debating between

Option 1: Object Inheritance

class CTMatrix(pd.DataFrame):
methods etc. here


or Option 2: Object Use

class CTMatrix(object):
_data = pd.DataFrame

then use getter, setter methods to control access to _data etc.


From a software engineering perspective is there an obvious choice here?

My thoughts so far are:

Option 1:


  1. Can use DataFrame methods directly on the CTMatrix Class (like
    CTmatrix.sort()
    ) without having to support them via methods on the encapsulated
    _data
    object in Option #2

  2. Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods



BUT


  1. Complications with some methods such as
    __init__()
    and having to pass the attributes up to the superclass
    super(MyDF, self).__init__(*args, **kw)



Option 2:


  1. More control over the Class and it's behavior

  2. Possibly more resilient to updates in Pandas?



But


  1. Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (
    CTMatrix.data.sort()
    )



Are there any additional downsides for taking the approach in Option #1?

Answer

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

There are a few of open issues on GitHub around this e.g.:

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.

Comments