Mel Spectrogram

Mel Spectrogram is a graphic representation of a Sound Wave, visualising frequency over time.

Here is a Mel Spectrogram of the audio of a Trumpet

Melspectrogram example of a Trumpet

The process of generating a Mel Spectrogram works like this:

  1. Break the audio signal down into short frames
  2. Convert time signal into the frequency domain using a Fourier Transform
  3. Convert frequencies into Mel scale, to more closely align with our intuition of frequencies. This operation is called Mel Filter Bank.
  4. Plot the Mel values over time.

Mel Scale

The Mel Scale is a perceptual scale of audio frequencies. In other words, it represents our perceived distance of the frequencies from others.

The Mel scale is a logarithmic formula where 1000Mel = 1kHz. You can convert Hz to Mel using this formula:

Mel(f)=2595log10(1+f100)Mel(f) = 2595 \log_{10} (1 + \frac{f}{100})

Frequency vs Mel Scale plot