Back propagation algorithm in ANN.

let me first summarize how an ANN works. First of all, we input some data to ANN to process it and ANN outputs something. ANN computes an output using its coefficients (learnable parameters) right? To find these parameters we define a loss function that gives us the distortion between ground truth and ANNâ€™s output. Our aim here is to find these parameters that optimizes the loss function.

If we have a linear system, we can find the parameters with just math since the problem becomes a convex optimization problem. However, in ANNs we apply some non-linearities to capture more complex structures in data. It makes our problem a non-convex (error surface is not convex at all points) optimization problem. To solve that kind of a problem, we use gradient based optimization during training. Backpropagation is the process of sending the error signal (loss we calculated after a forward pass) all the way through layers of ANNs to calculate parameter gradients with respect to the error. And these gradients tell us in which direction and how much we should update ANN parameters.