Introduction to Transformers

The original Transformer is based on the encoder-decoder architecture (e.g. tasks like machine translation, where a sequence of words is translated from one language to another.)

image.png

Encoder

image.png

1. Positional Embedding

2. Self Attention

3. Multi-Head Attention

Resource