Binary Cross-Entropy Loss
Binary Cross-Entropy (BCE), also known as log loss, is a loss function used in binary or multi-label machine learning training.
It's nearly identical to Negative Log-Likelihood except it supports any number of positive labels (including zero).
For each value in a set of model outputs, we first apply the Sigmoid Function before taking -log(pred) if the corresponding label is positive or -log(1-pred) if negative.
For a single binary output, the function can be expressed as:
{% notebook permanent/notebooks/bce-loss-function.ipynb cells[0:2] %}
Or in math:
Where is the model's predictions and is the true label.
Since will either be or , or will be 0, ensuring we only keep one value. That's equivalent to the if statement in code.
For multi-label outputs, the function takes the mean (or sometimes sum) of each of the log values:
{% notebook permanent/notebooks/bce-loss-function.ipynb cells[2:4] %}
That is represented in math as follows:
PyTorch provides the function via the nn.BCELoss class. It's the equivalent of nn.NLLLoss in multi-class classification with a single true label per input.
{% notebook permanent/notebooks/bce-loss-function.ipynb cells[4:5] %}
which is equivalent to this function:
{% notebook permanent/notebooks/bce-loss-function.ipynb cells[5:7] %}
Use nn.BCEWithLogitsLoss if your model architecture doesn't perform the Sigmoid Function on the final layer. That's equivalent to nn.CrossEntropyLoss in PyTorch (see Categorical Cross-Entropy Loss).
(Howard et al., 2020) (pg. 256-257)
References
Jeremy Howard, Sylvain Gugger, and Soumith Chintala. Deep Learning for Coders with Fastai and PyTorch: AI Applications without a PhD. O'Reilly Media, Inc., Sebastopol, California, 2020. ISBN 978-1-4920-4552-6. ↩