Softmax Function

Jul 08, 2021 permanent MachineLearning ActivationFunction

The Softmax Activation Function converts a vector of numbers into a vector of probabilities that sum to 1. It's applied to a model's outputs (or Logits) in Multi-class Classification.

It is the multi-class extension of the Sigmoid Activation Function.

The equation is:

$σ (z)_{i} = \frac{e ^{z_{i}}}{j = 1 \sum K e ^{z_{j}}}$

The intuition for it is that $e^{x_{i}}$ is always positive and increases fast, amplifying more significant numbers. Therefore, it tends to find a single result and is less useful for problems where you are unsure if inputs will always contain a label. For that, use multiple binary columns with the Sigmoid Activation Function.

[@howardDeepLearningCoders2020] (pg. 223-227)

Code example:

In [6]:

import numpy as np, pandas as pd

In [7]:

def softmax(x):
    return np.exp(x) / np.exp(x).sum()

In [10]:

logits =  np.array([-3.5, -2.37, 1.54, 5.23]) # some arbitrary numbers I made up that could have come out of a neural network
probs = softmax(logits)

In [11]:

pd.DataFrame({'logit': logits, 'prob': probs}, index=['woman', 'man', 'camera', 'tv'])

Out[11]:

	logit	prob
woman	-3.50	0.000158
man	-2.37	0.000488
camera	1.54	0.024348
tv	5.23	0.975007

Softmax is part of the Categorical Cross-Entropy Loss, applied before passing results to Negative Log-Likelihood function.

Tags

Notes by Lex Toumbourou

Softmax Function