Binary Cross-Entropy Loss
Binary Cross-Entropy (BCE), also known as log loss, is a loss function used in binary or multi-label machine learning training.
It's nearly identical to Negative Log-Likelihood except it supports any number of positive labels (including zero).
For each value in a set of model outputs, we first apply the Sigmoid Activation Function before taking -log(pred)
if the corresponding label is positive or -log(1-pred)
if negative.
For a single binary output, the function can be expressed as:
from math import log
def binary_cross_entropy_single_label(pred, label):
if label == 1:
return -log(pred)
return -log(1-pred)
pred = 0.99
label = 1
binary_cross_entropy_single_label(pred, label)
0.01005033585350145
Or in math:
Where is the model's predictions and is the true label.
Since will either be or , or will be 0, ensuring we only keep one value. That's equivalent to the if
statement in code.
For multi-label outputs, the function takes the mean (or sometimes sum) of each of the log values:
from numpy import mean
def binary_cross_entropy(preds, labels):
return mean([
binary_cross_entropy_single_label(pred, label) for pred, label in zip(preds, labels)])
preds = [0.99, 0.05, 0.95]
labels = [1, 0, 1]
binary_cross_entropy(preds, labels)
0.03754564154286754
That is represented in math as follows:
PyTorch provides the function via the nn.BCELoss
class. It's the equivalent of nn.NLLLoss
in multi-class classification with a single true label per input.
from torch import tensor, where, nn
preds = tensor([0.99, 0.05, 0.95]).float()
labels = tensor([1, 0, 1]).float()
nn.BCELoss()(preds, labels)
tensor(0.0375)
which is equivalent to this function:
def binary_cross_entropy(preds, labels):
return -where(labels==1, preds, 1-preds).log().mean()
binary_cross_entropy(preds, labels)
tensor(0.0375)
Use nn.BCEWithLogitsLoss
if your model architecture doesn't perform the Sigmoid Activation Function on the final layer. That's equivalent to nn.CrossEntropyLoss
in PyTorch (see Categorical Cross-Entropy Loss).
Howard et al. (2020) (pg. 256-257)
References
Jeremy Howard, Sylvain Gugger, and Soumith Chintala. Deep Learning for Coders with Fastai and PyTorch: AI Applications without a PhD. O'Reilly Media, Inc., Sebastopol, California, 2020. ISBN 978-1-4920-4552-6. ↩