Recap of the main idea
We need multiple attention mechanisms
Multi head attention
Visualization shows that each attention mechanism focus on sth. different