I'm experiencing some issues with the multivariate feature drift function in deepchecks. I'm trying to plot a running/rolling multivariate drift time series graph to look for instances of high drift scores. The running/rolling is extremely volatile.
The dataset has only 2 features (x-values). Both are numeric. There are no categorical features. The training/reference data has 150 rows. The test data has 480 rows.
I start calculating the multivariate drift score of the first 70 rows of the test data against the training data. Then, I add this drift score into a list.
Then, I remove the first data point (sorted by date) and add the next data point. The window stays at 70 data points. I then use this new batch of 70 to calculate the multivariate drift score against the training data and add it to a list. (i.e. my rolling drift score is calculated in steps of 1).
I continue shifting the window down by one data point each time until I run out of test data.
What I get in the end is a very volatile chart. One point could return a drift score of 0.5 and the next point could be 0 and the point after that could be 0.2.
What is going on with this multivariate drift score function? Is there a way to fix this or is my dataset too small? I read that deepchecks uses the HistGradientBoosterClassifier() function from sklearn.ensemble. Perhaps the dataset/batch size is too small to obtain a smooth graph.
Thanks for your help!