Explain the role of the Convolution Layer in CNN

Convolution is a linear operation of a smaller filter to a larger input that results in an output feature map.

Convolution layer: This layer performs an operation called a convolution, hence the network is called a convolutional neural network. It extracts features from the input images. Convolution is a linear operation that involves the multiplication of a set of weights with the input.

This technique was designed for 2d-input(array of data). The multiplication is performed between an array of input data and a 2d array of weights called a filter or kernel.

This is the component that detects features in images preserving the relationship between pixels by learning image features using small squares of input data i.e, respecting their spatial boundaries.

A typical CNN has about three to ten principal layers at the beginning where the main computation is convolution. Because of this often we refer to these layers as convolutional layers.

This is followed by two fully connected layers which function as a neural network. The convolutional layers extract the features from the images and the neural network at the fully connected layers classifies the images by classifying the features.

Each principal layer starts with normalization. Then a convolutional filter is used to create a featuremap. Now ReLu activation takes place. At this point, subsampling is used to reduce the size of the featuremap and a subsampled layer is created. Now, these four processes are repeated.

On the input image and on each of the following featuremaps, many convolutional filters are used. This means that from the input and from each featuremap, not just one but many featuremaps are created. In other words, the convolutional layers of a CNN actually form a tree-like structure.