What is Cyclic Learning Rate?

board-infinity · 15 October 2022 09:12

The objectives of the cyclical learning rate (CLR) are two-fold:

CLR gives an approach for setting the global learning rates for training neural networks that eliminate the need to perform tons of experiments to find the best values with no additional computation.
CLR provides an excellent learning rate range (LR range) for an experiment by introducing the concept of LR Range Test.

A Case Study of CLR in Python

We will be doing this using the classic MNIST dataset which is probably the most popular dataset for getting started into Computer Vision and Deep Learning. You will use keras extensively for all purposes of the experiment. keras provides a built-in version of the dataset. We will start off your experiment by importing that and by performing some basic EDA.

from keras.datasets import mnist Using TensorFlow backend.

You have imported the dataset successfully. Now, you will do some basic visualizations of the dataset.

import matplotlib.pyplot as plt 

(X_train, y_train), (X_test, y_test) = mnist.load_data() 

# Plot 4 images as gray scale 
plt.subplot(152) 
plt.imshow(X_train[0], cmap=plt.get_cmap('gray')) 
plt.subplot(153) 
plt.imshow(X_train[1], cmap=plt.get_cmap('gray')) 
plt.subplot(154) 
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.subplot(155) 
plt.imshow(X_train[3], cmap=plt.get_cmap('gray')) 
# Show the plot 
plt.show()

Why should we use Cyclical Learning Rates?

The first reason is that our network may become stuck in either saddle points or local minima, and the low learning rate may not be sufficient to break out of the area and descend into areas of the loss landscape with lower loss.
Secondly, our model and optimizer may be very sensitive to our initial learning rate choice. If we make a poor initial choice in the learning rate, our model may be stuck from the very start.