What are the variants of Back Propagation?

What are the variants of Back Propagation?


We feed a batch (of any size) of input data (image, text, audio, video, tabular, multimodal) to our neural network and we get some outputs then we calculate the loss (MSE, BCE etc) which tells us how far we are from the ground truth, higher the loss lower the accuracy. Now we need to update the weights and biases of neural network such that upon next feed the loss should be less than previous loss. We propagate this loss to each layer from backward to forward (that’s why name backpropagation) and telling the layers here’s the loss please adjust your weights and biases. Then we compute gradients and update the weights and biases like this:

grad_w = dL/dw (L - Loss, w- weights)

grad_b = dL/db (b- bias term)

Updating the weights

w = w - lr*grad_w (lr - learning rate)

b = b - lr*grad_b

This is also called gradient descent.