Explain Alexnet Architecture?

AlexNet is a convolutional neural network that is 8 layers deep. You can load a pre-trained version of the network trained on more than a million images from the ImageNet database [1]. The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals.

AlexNet was first utilized in the public setting when it won the ImageNet Large Scale Visual Recognition Challenge (ILSSVRC 2012 contest). It was at this contest that AlexNet showed that deep convolutional neural networks can be used for solving image classification.

Before exploring AlexNet it is essential to understand what is a convolutional neural network. Convolutional neural networks are one of the variants of neural networks where hidden layers consist of convolutional layers, pooling layers, fully connected layers, and normalization layers.

Convolution is the process of applying a filter over an image or signal to modify it. Now what is pooling? It is a sample-based discretization process. The main reason is to reduce the dimensionality of the input. Thus, allowing assumptions to be made about the features contained in the sub-regions binned.

AlexNet was the first convolutional network that used GPU to boost performance.

  1. AlexNet architecture consists of 5 convolutional layers, 3 max-pooling layers, 2 normalization layers, 2 fully connected layers, and 1 softmax layer.

  2. Each convolutional layer consists of convolutional filters and a nonlinear activation function ReLU.

  3. The pooling layers are used to perform max pooling.

  4. Input size is fixed due to the presence of fully connected layers.

  5. The input size is mentioned at most of the places as 224x224x3 but due to some padding which happens it works out to be 227x227x3

  6. AlexNet overall has 60 million parameters.