Adversarial Validation
A technique used to evalute how different the test set is to the training set, commonly used in Kaggle competitions to assess likelihood of a Leaderboard Shakeup. It's a form of Black-box Tests for datasets.
Steps:
- Label each element of a dataset
1
if in train or0
if test. - Train a model to predict label using features from the set.
If the model performs well (> 0.5 ROC) then there is likely a difference between datasets.
In this example, I compared the train and test set of the Plant Pathology 2020 - FGVC7 competition.