Deep Learning Advance Interview Questions

Q1. Which is Better Deep Networks or Shallow ones? and Why?
Both the Networks, be it shallow or Deep are capable of approximating any function. But what matters is how precise that network is in terms of getting the results. A shallow network works with
only a few features, as it can’t extract more. But a deep network goes deep by computing efficiently and working on more features/parameters.
Q2. Why is Weight Initialization important in Neural Networks?
Weight initialization is one of the very important steps. A bad weight initialization can prevent a network from learning but good weight initialization helps in giving a quicker convergence and a
better overall error.
1
2
3
4
5
6
7
8
9
10
11
12
13
params = [weights_hidden, weights_output, bias_hidden, bias_output]
def sgd(cost, params, lr=0.05):
grads = T.grad(cost=cost, wrt=params) updates = []
for p, g in zip(params, grads): updates.append([p, p - g * lr])
return updates
updates = sgd(cost, params)Biases can be generally initialized to zero. The rule for setting the weights is to be close to zero without being too small.
Q3. What’s the difference between a feed-forward and a backpropagation neural network?
A Feed-Forward Neural Network is a type of Neural Network architecture where the connections are “fed forward”, i.e. do not form cycles. The term “Feed-Forward” is also used when you input
something at the input layer and it travels from input to hidden and from hidden to the output layer.
Backpropagation is a training algorithm consisting of 2 steps:
Feed-Forward the values.
Calculate the error and propagate it back to the earlier layers.
So to be precise, forward-propagation is part of the backpropagation algorithm but comes before back-propagating.
Q4. What are the Hperparameteres? Name a few used in any Neural Network.
Hyperparameters are the variables which determine the network structure(Eg: Number of Hidden Units) and the variables which determine how the network is trained(Eg: Learning Rate).
Hyperparameters are set before training.
Number of Hidden Layers Network Weight Initialization
Activation Function
Learning Rate Momentum
Number of Epochs
Batch Size
Q5. Explain the different Hyperparameters related to Network and Training.
Network Hyperparameters
The number of Hidden Layers: Many hidden units within a layer with regularization techniques can increase accuracy. Smaller number of units may cause underfitting.
Network Weight Initialization: Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used.
Activation function: Activation functions are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries.
Training Hyperparameters
Learning Rate: The learning rate de+nes how quickly a network updates its parameters. Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up
the learning but may not converge.
Momentum: Momentum helps to know the direction of the next step with the knowledge of the previous steps. It helps to prevent oscillations. A typical choice of momentum is between 0.5 to 0.9.
The number of epochs: Number of epochs is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy
starts decreasing even when training accuracy is increasing(overfitting).
Batch size: Mini batch size is the number of sub-samples given to the network after which parameter update happens. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so
on.
Q6. What is Dropout?
Dropout is a regularization technique to avoid over+tting thus increasing the generalizing power. Generally, we should use a small dropout value of 20%-50% of neurons with 20% providing a good
starting point. A probability too low has minimal effect and a value too high results in under-learning by the network.
Use a larger network. You are likely to get better performance when dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.
Q7. In training a neural network, you notice that the loss does not decrease in the few starting epochs. What could be the reason?
The reasons for this could be:
The learning is rate is low
Regularization parameter is high
Stuck at local minima
Q8. Name a few deep learning frameworks
TensorFlow
Caffe
The Microsoft Cognitive Toolkit/CNTK
Torch/PyTorch MXNet
Chainer
Keras
Q9. What are Tensors?
Tensors are nothing but a de facto for representing the data in deep learning. They are just multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep
Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.
Q10. List a few advantages of TensorFlow?
It has platform flexibility
It is easily trainable on CPU as well as GPU for distributed computing.
TensorFlow has auto differentiation capabilities
It has advanced support for threads, asynchronous computation, and queue es.
It is a customizable and open source.