What are Attention based Models?

Attention models also called attention mechanisms, are deep learning techniques used to provide an additional focus on a specific component. In deep learning, attention relates to focusing on something in particular and noting its specific importance. It belongs to a class of models commonly called sequence-to-sequence models. The aim of these models, as the name suggests, is to produce an output sequence given an input sequence which is, in general, of different lengths.

In broad strokes, attention is expressed as a function that maps a query and “s set” of key-value pairs to an output. One in which the query, keys, values, and final output are all vectors. The output is then calculated as a weighted sum of the values, with the weight assigned to each value expressed by a compatibility function of the query with the corresponding key value.

In practice, attention allows neural networks to approximate the visual attention mechanism humans use. Like people processing a new scene, the model studies a certain point of an image with intense, “high resolution” focus, while perceiving the surrounding areas in “low resolution,” and then adjusts the focal point as the network begins to understand the scene.


  • Our goal is to generate the context vectors.
  • For example, context vector C1 tells us how much importance/ attention should be given the inputs: x0, x1, x2, x3
  • This layer in turn contains 3 subparts:
    1. Feed Forward Network
    2. Softmax Calculation
    3. Context vector generation

This is a better model when compared to the others because we are asking the model to make an informed choice. Given enough data, the model should be able to learn these attention weights just as humans do. And indeed these work better than the vanilla Encoder-Decoder models.