I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.
I am currently debating between
Option 1: Object Inheritance
methods etc. here
_data = pd.DataFrame
then use getter, setter methods to control access to _data etc.
super(MyDF, self).__init__(*args, **kw)
I would avoid subclassing
DataFrame, because many of the
DataFrame methods will return a new
DataFrame and not another instance of your
There are a few of open issues on GitHub around this e.g.:
More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.