I want to create a class derived from pandas.DataFrame with a slightly different __init__(). I'll store some additional data in new attributes and finally call DataFrame.__init__().
from pandas import DataFrame
class DataFrameDerived(DataFrame):
def __init__(self, *args, **kwargs):
self.derived = True
super().__init__(*args, **kwargs)
DataFrameDerived({'a':[1,2,3]})
This code gives the following error when creating the new attribute (self.derived = True):
RecursionError: maximum recursion depth exceeded while calling a Python object
It is possible, but the implementation isn't very open to extension. Indeed, the official docs suggest using alternatives. The implementation of
pd.DataFrameis complex, involving multiple inheritance with various mixins, and also, it uses the various attribute setting/getting hooks, like__getattr__and__setattr__, to among other things, provide syntactic sugar like usingdf.some_columnanddf.some_colum = whateverto work without using thedf['some_column']syntax. If you look at the stack trace, you can see that something is going on with__setattr__:Knowing this, one might blindly just use
object.__setattr__instead, to bypass this:But again, without really understanding the implementation, you are just crossing your fingers and hoping "it works". Which it may. But as noted in the linked docs, you are possibly also going to want to override the "constructor" methods, so that your data frame type will return data frames of it's own type when using dataframe methods.
Instead of using inheritance, an alternative is to instead register other accessor namespaces.. This is one simpler method to extend pandas, if that works for you.
Without knowing more details about what exactly you are trying to accomplish, it is difficult to suggest the best way forward. But you should definitely start by reading the whole of those docs I've linked to on Extending Pandas