sanguineturtle sanguineturtle - 1 year ago 137
Python Question

Pandas DataFrame Object Inheritance or Object Use?

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

I am currently debating between

Option 1: Object Inheritance

class CTMatrix(pd.DataFrame):
methods etc. here

or Option 2: Object Use

class CTMatrix(object):
_data = pd.DataFrame

then use getter, setter methods to control access to _data etc.

From a software engineering perspective is there an obvious choice here?

My thoughts so far are:

Option 1:

  1. Can use DataFrame methods directly on the CTMatrix Class (like
    ) without having to support them via methods on the encapsulated
    object in Option #2

  2. Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods


  1. Complications with some methods such as
    and having to pass the attributes up to the superclass
    super(MyDF, self).__init__(*args, **kw)

Option 2:

  1. More control over the Class and it's behavior

  2. Possibly more resilient to updates in Pandas?


  1. Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (

Are there any additional downsides for taking the approach in Option #1?

Answer Source

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

There are a few of open issues on GitHub around this e.g.:

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download