Binary Cross-Entropy Loss

Binary Cross-Entropy (BCE), also known as log loss, is a loss function used in binary or multi-label machine learning training.

It's nearly identical to Negative Log-Likelihood except it supports any number of positive labels (including zero).

For each value in a set of model outputs, we first apply the Sigmoid Activation Function before taking -log(pred) if the corresponding label is positive or -log(1-pred) if negative.

For a single binary output, the function can be expressed as:

Or in math:

L(p,y)=−(y×log(𝑝)⏟Expr 1+(1−𝑦)×log(1−𝑝)⏟Expr 2)L(p, y) = −(\underbrace{y \times log(𝑝)}_{\text{Expr 1}} + \underbrace{(1−𝑦) \times log(1−𝑝)}_{\text{Expr 2}})

Where pp is the model's predictions and yy is the true label.

Since yy will either be 11 or 00, Expr 1\text{Expr 1} or Expr 2\text{Expr 2} will be 0, ensuring we only keep one log⁡\log value. That's equivalent to the if statement in code.

For multi-label outputs, the function takes the mean (or sometimes sum) of each of the log values:

That is represented in math as follows:

L(P,Y)=−1N∑i=1N(Yi×log(Pi)+(1−Yi)×log(1−Pi))L(P, Y) = −\frac{1}{N} \sum\limits_{i=1}^{N} (Y_{i} \times log(P_{i}) + (1− Y_{i}) \times log(1− P_{i}))

PyTorch provides the function via the nn.BCELoss class. It's the equivalent of nn.NLLLoss in multi-class classification with a single true label per input.

which is equivalent to this function:

Use nn.BCEWithLogitsLoss if your model architecture doesn't perform the Sigmoid Activation Function on the final layer. That's equivalent to nn.CrossEntropyLoss in PyTorch (see Categorical Cross-Entropy Loss).

Howard et al. (2020) (pg. 256-257)


Jeremy Howard, Sylvain Gugger, and Soumith Chintala. Deep Learning for Coders with Fastai and PyTorch: AI Applications without a PhD. O'Reilly Media, Inc., Sebastopol, California, 2020. ISBN 978-1-4920-4552-6. ↩