What are Encoders-Decoders?

Encoder decoder models allow for a process in which a machine learning model generates a sentence describing an image. It receives the image as the input and outputs a sequence of words.

Encoder

  • A stack of several recurrent units (LSTM or GRU cells for better performance) where each accepts a single element of the input sequence, collects information for that element and propagates it forward.
  • In the question-answering problem, the input sequence is a collection of all words from the question. Each word is represented as x_i where i is the order of that word.
  • The hidden states h_i are computed using the formula:

ht = f(W(hh).h(t-1) + W(hx).xt)

Decoder

  • A stack of several recurrent units where each predicts an output y_t at a time step t.
  • Each recurrent unit accepts a hidden state from the previous unit and produces and output as well as its own hidden state.
  • In the question-answering problem, the output sequence is a collection of all words from the answer. Each word is represented as y_i where i is the order of that word.
  • Any hidden state h_i is computed using the formula:

ht = f(W(hh). h(t-1))