Adversarial Validation

May 12, 2021 permanent EDA

A technique used to evalute how different the test set is to the training set, commonly used in Kaggle competitions to assess likelihood of a Leaderboard Shakeup. It's a form of Black-box Tests for datasets.

Steps:

Label each element of a dataset 1 if in train or 0 if test.
Train a model to predict label using features from the set.

If the model performs well (> 0.5 ROC) then there is likely a difference between datasets.

In this example, I compared the train and test set of the Plant Pathology 2020 - FGVC7 competition.

Backlinks

Out-of-Domain

Tags

Notes by Lex Toumbourou

Adversarial Validation

Backlinks