CNN Layers.

- INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R, G, B.
- CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
- RELU layer will apply an elementwise activation function, such as the max(0,x), max(0,x) thresholding at zero.
- POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
- FC (i.e. fully-connected) layer will compute the class scores, resulting in a volume of size [1x1x10], where each of the 10 numbers corresponds to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

For further info, refer this page: https://cs231n.github.io/convolutional-networks/#pool

A typical CNN has about three to ten principal layers at the beginning where the main computation is convolution. Because of this often we refer to these layers as convolutional layers.

This is followed by two fully connected layers which function as a neural network. The convolutional layers extract the features from the images and the neural network at the fully connected layers classifies the images by classifying the features.

Each principal layer starts with normalization. Then a convolutional filter is used to create a featuremap. Now ReLu activation takes place. At this point, subsampling is used to reduce the size of the featuremap and a subsampled layer is created. Now, these four processes are repeated.