Home About

Tags

MachineLearning (26) LinearAlgebra (16) GameDesign (12) ComputerScience (11) SoftwareEngineering (11) LargeLanguageModels (9) AudioEngineering (7) DiscreteMath (6) AutomatedTesting (6) Roblox (5) Zettelkasten (5) AgenticReasoning (4) More

Notes by Lex Toumbourou

Multi-head Attention

Apr 23, 2024 permanent

A key layer in the Transformer architecture which represents a stack of Scaled-Dot Product Attention attention modules.

Multi-head attention diagram