What are Encoders-Decoders?

board-infinity · 15 October 2022 10:42

Encoder decoder models allow for a process in which a machine learning model generates a sentence describing an image. It receives the image as the input and outputs a sequence of words.

Encoder

A stack of several recurrent units (LSTM or GRU cells for better performance) where each accepts a single element of the input sequence, collects information for that element and propagates it forward.
In the question-answering problem, the input sequence is a collection of all words from the question. Each word is represented as x_i where i is the order of that word.
The hidden states h_i are computed using the formula:

ht = f(W(hh).h(t-1) + W(hx).xt)

Decoder

A stack of several recurrent units where each predicts an output y_t at a time step t.
Each recurrent unit accepts a hidden state from the previous unit and produces and output as well as its own hidden state.
In the question-answering problem, the output sequence is a collection of all words from the answer. Each word is represented as y_i where i is the order of that word.
Any hidden state h_i is computed using the formula:

ht = f(W(hh). h(t-1))