Self-Attention
Self-Attention, sometimes called intra-attention, is an Attention Mechanism technique concerned with representing an input sequence as a weighted-average of the other token representations in the sequence, based on how important tokens are to each other.
The common implementation of self-attention comes from the Transformer architecture: Scaled-Dot Product Attention