There are three main architectures for transformer models:

encoders
decoders
encoder-decoders

The initial success of the early transformer models triggered a Cambrian explosion in model development as researchers built models on various datasets of different size and nature, used new pretraining objectives, and tweaked the architecture to further improve performance. Although the zoo of models is still growing fast, they can still be divided into these three categories.

In this section we'll provide a brief overview of the most important transformer models in each class. Let's start by taking a look at the transformer family tree.

The Transformer Tree of Life

Untitled

Overview

BERT, DistilBERT, RoBERTa, GPT, BART,….

Untitled