Transformer Basics
Introduction to Attention Is All You Need paper
Encoder & Decoder
Positional encoding
Why multi head attention is so good!