Explain the different layers in CNN

The different layers involved in the architecture of CNN are as follows:

1. Input Layer: The input layer in CNN should contain image data. Image data is represented by a three-dimensional matrix. We have to reshape the image into a single column.

For Example, Suppose we have an MNIST dataset and you have an image of dimension 28 x 28 =784, you need to convert it into 784 x 1 before feeding it into the input. If we have “k” training examples in the dataset, then the dimension of input will be (784, k).

2. Convolutional Layer: To perform the convolution operation, this layer is used which creates several smaller picture windows to go over the data.

3. ReLU Layer: This layer introduces the non-linearity to the network and converts all the negative pixels to zero. The final output is a rectified feature map.

4. Pooling Layer: Pooling is a down-sampling operation that reduces the dimensionality of the feature map.

5. Fully Connected Layer: This layer identifies and classifies the objects in the image.

6. Softmax / Logistic Layer: The softmax or Logistic layer is the last layer of CNN. It resides at the end of the FC layer. Logistic is used for binary classification problem statement and softmax is for multi-classification problem statement.

7. Output Layer: This layer contains the label in the form of a one-hot encoded vector.

There are three types of layers in CNN.

  1. The input layer (the first layer that takes input) - this layer takes the pixel values of an image as an input
  2. Hidden Layers (Because they can be more than one) - hidden layer are the layers that make the CNN deep (the more the hidden layer, the deeper is the network). They take values from the input layer and apply some kind of mathematical operations like convolution and pooling. So hidden layers try to understand the pattern of an image. In the first layer, it identifies the spatial features from an image and then more important features in the next hidden layer
  3. Output layer (Last layer of the network that produces the prediction output) - produces the prediction for every class with some probability