Deep Learning Prep Flashcards

Question 1

Q

Difference between AI, Machine learning and Deep Learning

Answer

A

Deep learning simulates the brain. ML uses statistical methods to enable machines to improve with experience. Deep learning makes computation of the multi layered neural network feasible.

Question 2

Q

Is deep learning better than ML?

Answer

A

Deep learning is more useful for working with high dimensional data. When we have a large amount of inputs or inputs with different types of data.

Question 3

Q

What is a perceptron and how does is work?

Answer

A

Deep learning uses the concept of functioning neurons like biological neurons. A perceptron is a linear model used for binary classification. It models a neuron which has a set of inputs, each neuron has a specific weight.

Question 4

Q

What is the role of weights and bias’

Answer

A

Normally bias is treated as another weighted input. Weights are also an additional input that decides which neurons will be activated.

Question 5

Q

What are the activation functions? *

Answer

A

Linear / Identity *
Unit or Binary Step
Sigmoid or Logistic
Tanh
Relu
Softmax

The activation function decides if a neuron should be activated or not by calculating the weighted sum and further adding bias. The purposed of the activation function is to introduce a non linearity to the output of the neuron.

Question 6

Q

Explain learning of perceptron?

Answer

A

4 steps:

Initializing weights and threshold
Provide input and calculate the output
Update the weights
Repeat steps 2 and 3

Question 7

Q

What is the significance of cost or loss function?

Answer

A

A cost function is a measure of the accuracy of the neural network with respect to a given training sample and expected output. It provides the performance of the neural network as a whole, in deep learning the goal is to minimize the cost function. For that we use the concept of gradient descent.

Question 8

Q

What is gradient descent? *

Answer

A

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient *

Question 9

Q

What are benefits of mini batch gradient descent?*

Answer

A

More efficient when compared to stochastic gradient descent. Generalization by finding the flat minima. Mini batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima. *

Question 10

Q

What are steps for using gradient descent algorithms?

Answer

A

Steps:

Initialize random weight and bias
Pass an input through the network and get values from the output layer
Calculate the error between the actual value and the predicted value
Go to each neuron which contributes to the error and then change its respective values to reduce the error
Reiterate until you find the best weights of the network

Question 11

Q

Create a gradient decent in python

Answer

A

# our weights 
params = [weights_hidden, weights_output, bias_hidden, bias_output]

#define function with sgd
def sgd(cost, params, lr=0.05):

grads = T.grad(cost=cost, wrt=params)
updates = []

for p, g in zip(params, grads):
updates.append([p, p- g * lr])

return updates

updates = sgd(cost, params)

Question 12

Q

What are the shortcomings of a single layer perceptron?

Answer

A

Single layer perceptron cannot classify non-linearly separable data points. They cannot solve complex problems that have a lot of parameters.

Question 13

Q

What is a multi-layer perceptron?

Answer

A

Is a deep artificial neural network that is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP

Input nodes
Hidden nodes
Output nodes

Question 14

Q

What exactly is data normalization and why do we need it?

Answer

A

Is a very important pre processing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general it boils down to subtracting the mean of each data point and dividing by its standard deviation so that we have normally distributed data.

Question 15

Q

What are better, deep NN’s or shallow ones?

Answer

A

At every level a network learns more and new and more abstract representation of the input. Deeper networks work better.

Question 16

Q

What is weight initialization in a deep NN?

Answer

A

Bad initialization can prevent a NN from learning and a good one can speed up the rate of convergence and a better overall error. The rule is to set the weights close to zero without being too small

Question 17

Q

What is the difference between a feed forward and back propagation NN?

Answer

A

Feed forward: connections are fed forward and do not form cycles

Back Propagation: Consists of two steps, the first is feed forwarding the values and the second is to calculate the error and propagate it back to the earlier layers

Question 18

Q

What are hyperparameters? Name a few in any NN

Answer

A

Hyperparameters are the variables which determine the network structure and the variables which determine how the network is trained (eg. Learning Rate, number of epochs, batch size)

Question 19

Q

Explain the different hyper parameters related to network and training?

Answer

A

Network Hyperparameters:

Number of hidden layers: the layers between the input and the output layers
Network weight initialization: mostly uniform weight distributions are used
Activation function: are used to introduce non linearity to the models, which allows deep learning models to learn non linear prediction boundaries. Generally the rectifier activation function (ReLu) is the most popular.

Training Parameters:

Learning Rate: Defines how quickly a network updates it parameter. Low learning rate slows down the learning process but converts smoothly. A larger learning rate speeds up learning rate but may not converge as smooth as low LR. Usually a decaying learning rate is preferred to get best of both worlds and best expected output.
Momentum: helps identify direction of next step with knowledge of previous step. Helps prevent Oscillation and is typically between 0.5-0.9
Number of Epochs: Number of times the network is shown the training data. High numbers can lead to overfitting.
Batch size: number of samples given to the network after updating parameters, usually in 32, 16 or 64 (Is arbitrary number)

Question 20

Q

What is dropout?

Answer

A

Is a regularization technique to avoid overfitting, which is to increase the validation accuracy, thus increasing the generalizing power.
Generally we use a small dropout value of 20%-50% of neurons. A value too low has minimal effect and a value too high results in under learning by the network.

Question 21

Q

In training a NN you notice that the loss does not decrease in the few starting epochs. What could be the reason?

Answer

A

The learning rate is too low
Regularization parameter is too high
Stuck at local minima

Question 22

Q

Name a few deep learning frameworks

Answer

A

Tensorflow
Pytorch
Keras
CNTK
Caffe
Chainer

Question 23

Q

What are tensors?

Answer

A

Tensors are nothing but a de facto representation of the data in deep learning. They are multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.

Question 24

Q

List a few advantages of Tensorflow

Answer

A

It has platform flexibility
It is easily trainable on CPU as well as GPU for distributed computing
Tensorflow has auto differentiating capabilities
It has advanced support for threads, asynchronous computation.
It is customizable and open source

Question 25

Q

What is computational graph?

Answer

A

Is a series of Tensorflow operations arranged as nodes in the graph. Each node takes zero or more tensors as input and produces a tensor as output

Question 26

Q

What is a CNN?

Answer

A

Is a class of deep learning NN’s, most commonly applied to analyzing visual imagery. Unlike NN, where the input is a vector, here the input is a multi channeled image. CNN’s use a variation of multilayer perceptron’s designed to require minimal pre-processing

Question 27

Q

What are the different layers of a CNN?

Answer

A

Convolution: This layer comprises of a set of independent filters.
ReLu: This layer is used with the convolutional layer.
Pooling: Its function is to progressively reduce the spatial size of the representation to reduce the number of parameters and computation in the network.
Full Connectedness: Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular NN’s.

Question 28

Q

What is an RNN?

Answer

A

Recurrent NN’s are a type of artificial NN designed to recognize patterns in sequences of data, such as text, genomes, handwriting, the spoken word, numerical times series data.
RNN’s use backpropagation algorithm for training because of their internal memory. RNN’s are able to remember important things about the input they receive, which enables them to be very precise in predicting what’s coming next.

Question 29

Q

What are some issues faced when training an RNN?

Answer

A

Recurrent NN’s use backpropagation algorithm for training but it is applied for every timestamp. It is commonly known as Back Propagation Through Time (BTT). Some issues in this process are vanishing gradient and exploding gradient.

Question 30

Q

What is vanishing gradient, how is this harmful?

Answer

A

In such methods, each of the neural network’s weights receives an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

Question 31

Q

What is exploding gradient?

Answer

A

Are a problem when large error gradients accumulate and result in very large updates to NN model weights during training.
Gradient Descent process works best when these updates are small and controlled.
When the magnitudes of the gradients accumulate, an unstable network is likely to occur, which can cause poor prediction of results or even a model that reports nothing useful.

Question 32

Q

Explain the importance of LSTM

Answer

A

Long Short Term Memory is an artificial recurrent neural network architecture used in the field of deep learning.
Unlike standard feedforward NN’s, LSTM has feedback connections that make it a general purpose computer.
It can not only process single data points, but also entire sequences of data.
They are a special kind of Recurrent NN’s which are capable of learning long-term dependencies.

Question 33

Q

Explain Autoencoders and its uses?

Answer

A

An autoencoder NN is an unsupervised machine learning algorithm that applies backpropagation, setting the target values to be equal to the inputs.
Autoencoders are used to reduce the size of our inputs into a smaller representation.
If anyone needs the original data, they can reconstruct it from compressed data.

Question 34

Q

In terms of dimensionality reduction, how does Autoencoder differ from PCA?

Answer

A

It is more efficient to learn several layers with an Autoencoder rather than learn one huge transformation with PCA.
An Autoencoder provides a representation of each layer as the output.
It can make use of pre-trained layers from another model to apply transfer learning to enhance the encoder/decoder

Question 35

Q

What is Autoencoding used for?

Answer

A

Image coloring: converting any black and white image back to colored
Feature Variation: Extracts only required features of an image and generates the output by removing any noise
Dimensionality reduction: The reconstructed image is the same as our input but with reduced dimensions
Denoising image

Question 36

Q

What are the layers of the Autoencoder?

Answer

A

Encoder: This part of the network compresses the input into a latent space representation
Code: This part of the network represents the compressed input which is fed to the decoder
Decoder: This layer decodes the encoded image back to the original dimension

Question 37

Q

What is a Restricted Boltzmann Machine?

Answer

A

It is an undirected graphical model that plays a major role in Deep Learning Framework in recent times.
It is an algorithm which is useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modelling.