Multi-head Attention - Notes by Lex

Multi-head Attention

Apr 23, 2024 permanent

A key layer in the Transformer architecture which represents a stack of Scaled-Dot Product Attention attention modules.

Multi-head attention diagram