A technique used to evalute how different the test set is to the training set, commonly used in Kaggle competitions to assess likelihood of a Leaderboard Shakeup. It's a form of Black-box testing for datasets.
- Label each element of a dataset
1if in train or
- Train a model to predict label using features from the set.
If the model performs well (> 0.5 ROC) then there is likely a difference between datasets.
In this example, I compared the train and test set of the Plant Pathology 2020 - FGVC7 competition.