**Encoder decoder models allow for a process in which a machine learning model generates a sentence describing an image**. It receives the image as the input and outputs a sequence of words.

**Encoder**

- A stack of several recurrent units (LSTM or GRU cells for better performance) where each accepts a single element of the input sequence, collects information for that element and propagates it forward.
- In the question-answering problem, the input sequence is a collection of all words from the question. Each word is represented as
*x_i*where*i*is the order of that word. - The hidden states
*h_i*are computed using the formula:

ht = f(W(hh).h(t-1) + W(hx).xt)

**Decoder**

- A stack of several recurrent units where each predicts an output
*y_t*at a time step*t*. - Each recurrent unit accepts a hidden state from the previous unit and produces and output as well as its own hidden state.
- In the question-answering problem, the output sequence is a collection of all words from the answer. Each word is represented as
*y_i*where*i*is the order of that word. - Any hidden state
*h_i*is computed using the formula:

ht = f(W(hh). h(t-1))