What are the problems associated with the Convolution operation and how can one resolve them?

As we know, convolving an input of dimensions 6 X 6 with a filter of dimension 3 X 3 results in the output of 4 X 4 dimension. Let’s generalize the idea:

We can generalize it and say that if the input is n X n and the Filter Size is f X f, then the output size will be (n-f+1) X (n-f+1):

  • Input: n X n
  • Filter size: f X f
  • Output: (n-f+1) X (n-f+1)

There are primarily two disadvantages here:

  • When we apply a convolutional operation, the size of the image shrinks every time.
  • Pixels present in the corner of the image i.e, in the edges, are used only a few times during convolution as compared to the central pixels. Hence, we do not focus too much on the corners so it can lead to information loss.

To overcome these problems, we can apply the padding to the images with an additional border, i.e., we add one pixel all around the edges. This means that the input will be of the dimension 8 X 8 instead of a 6 X 6 matrix. Applying convolution on the input of filter size 3 X 3 on it will result in a 6 X 6 matrix which is the same as the original shape of the image. This is where Padding comes into the picture:

Padding: In convolution, the operation reduces the size of the image i.e, spatial dimension decreases thereby leading to information loss. As we keep applying convolutional layers, the size of the volume or feature map will decrease faster.

Zero Paddings allow us to control the size of the feature map.

Padding is used to make the output size the same as the input size.

Padding amount = number of rows and columns that we will insert in the top, bottom, left, and right of the image. After applying padding,

  • Input: n X n
  • Padding: p
  • Filter size: f X f
  • Output: (n+2p-f+1) X (n+2p-f+1)