SMOTE
SMOTE, or Synthetic Minority Oversampling Technique, is a technique for addressing class imbalance in datasets by creating synthetic examples for a minority class.
The steps involved in SMOTE are:
- Randomly pick a point from the minority class.
- Find its K-Nearest Neighbours in the minority class.
- Randomly select one of these neighbours and use it to create a synthetic example.
Image by Emilia Orellana
A synthetic sample is created by taking the vector difference between the feature vector of the randomly selected neighbour and the original instance, multiplying this difference by a random number between 0 and 1, and then adding it to the feature vector of the original instance.
Here is a mathematical representation:
Let be a randomly chosen minority class sample and be a randomly chosen neighbour of .
The synthetic instance is created as follows:
Where is a random number between 0 and 1.
This interpolation method allows the synthetic samples to be as realistic and representative as possible, lying directly along the line segment between two existing minority class samples in the feature space.