**Q1. Which is Better Deep Networks or Shallow ones? and Why?**

Both the Networks, be it shallow or Deep are capable of approximating any function. But what matters is how precise that network is in terms of getting the results. A shallow network works with

only a few features, as it can’t extract more. But a deep network goes deep by computing efficiently and working on more features/parameters.

**Q2. Why is Weight Initialization important in Neural Networks?**

Weight initialization is one of the very important steps. A bad weight initialization can prevent a network from learning but good weight initialization helps in giving a quicker convergence and a

better overall error.

1

2

3

4

5

6

7

8

9

10

11

12

13

params = [weights_hidden, weights_output, bias_hidden, bias_output]

def sgd(cost, params, lr=0.05):

grads = T.grad(cost=cost, wrt=params) updates = []

for p, g in zip(params, grads): updates.append([p, p - g * lr])

return updates

updates = sgd(cost, params)Biases can be generally initialized to zero. The rule for setting the weights is to be close to zero without being too small.

**Q3. What’s the difference between a feed-forward and a backpropagation neural network?**

A Feed-Forward Neural Network is a type of Neural Network architecture where the connections are “fed forward”, i.e. do not form cycles. The term “Feed-Forward” is also used when you input

something at the input layer and it travels from input to hidden and from hidden to the output layer.

Backpropagation is a training algorithm consisting of 2 steps:

Feed-Forward the values.

Calculate the error and propagate it back to the earlier layers.

So to be precise, forward-propagation is part of the backpropagation algorithm but comes before back-propagating.

**Q4. What are the Hperparameteres? Name a few used in any Neural Network.**

Hyperparameters are the variables which determine the network structure(Eg: Number of Hidden Units) and the variables which determine how the network is trained(Eg: Learning Rate).

Hyperparameters are set before training.

Number of Hidden Layers Network Weight Initialization

Activation Function

Learning Rate Momentum

Number of Epochs

Batch Size

**Q5. Explain the different Hyperparameters related to Network and Training.**

Network Hyperparameters

The number of Hidden Layers: Many hidden units within a layer with regularization techniques can increase accuracy. Smaller number of units may cause underfitting.

Network Weight Initialization: Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used.

Activation function: Activation functions are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries.

Training Hyperparameters

Learning Rate: The learning rate de+nes how quickly a network updates its parameters. Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up

the learning but may not converge.

Momentum: Momentum helps to know the direction of the next step with the knowledge of the previous steps. It helps to prevent oscillations. A typical choice of momentum is between 0.5 to 0.9.

The number of epochs: Number of epochs is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy

starts decreasing even when training accuracy is increasing(overfitting).

Batch size: Mini batch size is the number of sub-samples given to the network after which parameter update happens. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so

on.

**Q6. What is Dropout?**

Dropout is a regularization technique to avoid over+tting thus increasing the generalizing power. Generally, we should use a small dropout value of 20%-50% of neurons with 20% providing a good

starting point. A probability too low has minimal effect and a value too high results in under-learning by the network.

Use a larger network. You are likely to get better performance when dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.

**Q7. In training a neural network, you notice that the loss does not decrease in the few starting epochs.** What could be the reason?

The reasons for this could be:

The learning is rate is low

Regularization parameter is high

Stuck at local minima

**Q8. Name a few deep learning frameworks**

TensorFlow

Caffe

The Microsoft Cognitive Toolkit/CNTK

Torch/PyTorch MXNet

Chainer

Keras

**Q9. What are Tensors?**

Tensors are nothing but a de facto for representing the data in deep learning. They are just multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep

Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.

**Q10. List a few advantages of TensorFlow?**

It has platform flexibility

It is easily trainable on CPU as well as GPU for distributed computing.

TensorFlow has auto differentiation capabilities

It has advanced support for threads, asynchronous computation, and queue es.

It is a customizable and open source.