Binary Cross-Entropy Loss

Aug 08, 2021 permanent MachineLearning

Binary Cross-Entropy (BCE), also known as log loss, is a loss function used in binary or multi-label machine learning training.

It's nearly identical to Negative Log-Likelihood except it supports any number of positive labels (including zero).

For each value in a set of model outputs, we first apply the Sigmoid Activation Function before taking -log(pred) if the corresponding label is positive or -log(1-pred) if negative.

For a single binary output, the function can be expressed as:

In [1]:

from math import log

def binary_cross_entropy_single_label(pred, label):
    if label == 1:
        return -log(pred)
    
    return -log(1-pred)

In [2]:

pred = 0.99
label = 1

binary_cross_entropy_single_label(pred, label)

Out[2]:

0.01005033585350145

Or in math:

$L (p, y) = - (Expr 1 y \times l o g (p) + Expr 2 (1 - y) \times l o g (1 - p))$

Where $p$ is the model's predictions and $y$ is the true label.

Since $y$ will either be $1$ or $0$ , $Expr 1$ or $Expr 2$ will be 0, ensuring we only keep one $lo g$ value. That's equivalent to the if statement in code.

For multi-label outputs, the function takes the mean (or sometimes sum) of each of the log values:

In [3]:

from numpy import mean

def binary_cross_entropy(preds, labels):
    return mean([
        binary_cross_entropy_single_label(pred, label) for pred, label in zip(preds, labels)])

In [4]:

preds = [0.99, 0.05, 0.95]
labels = [1, 0, 1]

binary_cross_entropy(preds, labels)

Out[4]:

0.03754564154286754

That is represented in math as follows:

$L (P, Y) = - \frac{1}{N} i = 1 \sum N (Y_{i} \times l o g (P_{i}) + (1 - Y_{i}) \times l o g (1 - P_{i}))$

PyTorch provides the function via the nn.BCELoss class. It's the equivalent of nn.NLLLoss in multi-class classification with a single true label per input.

In [5]:

from torch import tensor, where, nn

preds = tensor([0.99, 0.05, 0.95]).float()
labels = tensor([1, 0, 1]).float()

nn.BCELoss()(preds, labels)

Out[5]:

tensor(0.0375)

which is equivalent to this function:

In [6]:

def binary_cross_entropy(preds, labels):
    return -where(labels==1, preds, 1-preds).log().mean()

In [7]:

binary_cross_entropy(preds, labels)

Out[7]:

tensor(0.0375)

Use nn.BCEWithLogitsLoss if your model architecture doesn't perform the Sigmoid Activation Function on the final layer. That's equivalent to nn.CrossEntropyLoss in PyTorch (see Categorical Cross-Entropy Loss).

Howard et al. (2020) (pg. 256-257)

References

Jeremy Howard, Sylvain Gugger, and Soumith Chintala. Deep Learning for Coders with Fastai and PyTorch: AI Applications without a PhD. O'Reilly Media, Inc., Sebastopol, California, 2020. ISBN 978-1-4920-4552-6. ↩

Tags

Notes by Lex Toumbourou

Binary Cross-Entropy Loss

References