Does there exist any R function or packages that records the operations applied to a tibble/data frame?
For example, if I did the following
data(iris)
my_table <- iris %>% filter(Sepal.Length>6) %>% filter(Species == 'virginica')
I would want the output to be something of the form
display_filter_function(my_table)
output:
Step filter
1 sepal.length > 6
2 Species == 'virginica'
I am thinking that this would be something similar to the functionality provided by the recipes package, but not needing to use the step_ family of function
I've written a little module for you. It is a standalone resource and has only one dependency beyond
baseR: namelydplyritself. The module is long, so I have put it at the bottom of this post. You can find the code itself under the Module section, and its usage is demonstrated under the Usage section.This model could theoretically be extended to all
dplyrfunctions, and to other generic functions as well. To keep things manageable, I myself have implemented it fordplyr::filter()alone.Background
This module leverages the R concept of generic methods, like
print()andformat()andmean()andsummary(). Suppose you wish toprint()adata.frameobject. The genericprint()function......does not do the work itself! Rather, it dispatches to some
print.*()method, via the line:Now the native
data.frameclass has its own specialprint()method calledprint.data.frame().So when
UseMethod()seeks a matching ("print") method, it findsprint.data.frame()ready and waiting! It is theprint.data.frame()function that actually handles the printing for thedata.frame.More generally, a generic function like
fn()...can be implemented for a (S3) class like
cls, with a function of the formfn.cls():Note
The
fn.default()method handlesfn()for unimplemented classes. So in the absence of aprint.cls()function, thenUseMethod()would dispatch aclsobject toprint.default():Approach
By defining a custom S3 class called
hst_obj— "historical object" — I override the "generic" behavior ofdplyr::filter()......which is designed to dispatch via
UseMethod("filter"). To that end, I implement the functionfilter.hst_obj():When you call
dplyr::filter()on ahst_objobject, thenfilter.hst_obj()jumps into action! Whenever it filters the object, it also records the filtration criteria in the special attributeobj_hst, which maintains the "object history".This history is a
tibble......which has four columns:
step: Thefilter()step in the workflow.order: The criterion within thefilter()step.expr: The actual code (language) for the criterion (Sepal.Length > 6), useful for programmatic manipulation of R.text: A textual (character) representation of that code ("Sepal.Length > 6"), for visual clarity.Usage
You'll want to load
dplyritself, and thensource()the module (mod.R) from (say) your working directory.Warning
The modular function
filter.hst_obj()must be loaded into the same workspace where you usedplyr::filter(). Per the documentationHere is a simple workflow on the
irisdataset.Now we transform the dataset into a "historical object" called
iris_hst, viaas_hst_obj().Per
is_hst_obj(), it is indeed a historical object.However, its history via
get_hst()is still blank.We now perform the same workflow on the historical dataset
iris_hst......which yields a consistent output.
Crucially, we can now access the history via
get_hst():We can also "reset" the history via
reset_hst(), which clears thetibbleof historical data.Finally, we can revert to an "unhistorical" object via
un_hst_obj(), which removes thehst_objclassification and deletes theobj_hstattribute:Module
Here is the module. I recommend saving it locally, as (say)
mod.Rin (say) your working directory. I also recommend theboxpackage, which can load such modules painlessly viabox::use(./mod).