Issues with DeepChecks' Multivariate Drift

57 views Asked by Tan Yu At 30 June 2023 at 01:28

I'm experiencing some issues with the multivariate feature drift function in deepchecks. I'm trying to plot a running/rolling multivariate drift time series graph to look for instances of high drift scores. The running/rolling is extremely volatile.

The dataset has only 2 features (x-values). Both are numeric. There are no categorical features. The training/reference data has 150 rows. The test data has 480 rows.

I start calculating the multivariate drift score of the first 70 rows of the test data against the training data. Then, I add this drift score into a list.

Then, I remove the first data point (sorted by date) and add the next data point. The window stays at 70 data points. I then use this new batch of 70 to calculate the multivariate drift score against the training data and add it to a list. (i.e. my rolling drift score is calculated in steps of 1).

I continue shifting the window down by one data point each time until I run out of test data.

What I get in the end is a very volatile chart. One point could return a drift score of 0.5 and the next point could be 0 and the point after that could be 0.2.

What is going on with this multivariate drift score function? Is there a way to fix this or is my dataset too small? I read that deepchecks uses the HistGradientBoosterClassifier() function from sklearn.ensemble. Perhaps the dataset/batch size is too small to obtain a smooth graph.

Thanks for your help!

Original Q&A

TechQA.

Issues with DeepChecks' Multivariate Drift

There are 0 answers

Related Questions in PYTHON

Related Questions in MULTIVARIATE-TESTING

Related Questions in MLOPS

Popular Questions

Trending Questions