Example input
The drift score, in this case, is the ROC AUC score of the domain classifier computed on a validation dataset. This ROC AUC is compared to the ROC AUC of the random classifier at a set percentile. To ensure the result is statistically meaningful, we repeat the calculation 1000 times with randomly assigned target class probabilities. This produces a distribution with a mean of 0,5. We then take the 95th percentile of this distribution and compare it to the ROC-AUC score of the domain classifier. If the classifier score is higher, we consider the data drift to be detected. (You can also set a different percentile as a parameter). This approach helps identify how well a machine learning model can distinguish between current and reference data. It also protects against false positive drift results since we explicitly compare the classifier score against the ābest random scoreā we could obtain.
Example output
Question:
What is the purpose of the 95th percentile of the distribution?
A. To compare the ROC-AUC score of the domain classifier
B. To identify how well a machine learning model can distinguish between current and reference data
C. To protect against false positive drift results
D. To set a different percentile as a parameter
Correct Answer: A. To compare the ROC-AUC score of the domain classifier