Explain the role of the flattening layer in CNN

After a series of convolution and pooling operations on the feature representation of the image, we then flatten the output of the final pooling layers into a single long continuous linear array or a vector.

The process of converting all the resultant 2-d arrays into a vector is called Flattening.

Flatten output is fed as input to the fully connected neural network having varying numbers of hidden layers to learn the non-linear complexities present with the feature representation.

CNNs take 3d(height, width, channels) or 4d input(volume) and perform 1D,2D or 3D convolution with filters of 1,2 and 3 dimensions respectively. Flatten means that anything greater than 1 dimension must be convert to 1D. This is done to feed the output of CNN to fully connected network(to classify features learnt by CNN) or to feed output to softmax unit to get the probability. Since, dimensions are greater than 1D in CNN, thats why flattening has to be applied after CNN to convert output to 1D.