Deep Learning Prep Flashcards

1
Q

Difference between AI, Machine learning and Deep Learning

A

Deep learning simulates the brain. ML uses statistical methods to enable machines to improve with experience. Deep learning makes computation of the multi layered neural network feasible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is deep learning better than ML?

A

Deep learning is more useful for working with high dimensional data. When we have a large amount of inputs or inputs with different types of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a perceptron and how does is work?

A

Deep learning uses the concept of functioning neurons like biological neurons. A perceptron is a linear model used for binary classification. It models a neuron which has a set of inputs, each neuron has a specific weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the role of weights and bias’

A

Normally bias is treated as another weighted input. Weights are also an additional input that decides which neurons will be activated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the activation functions? *

A
  • Linear / Identity *
  • Unit or Binary Step
  • Sigmoid or Logistic
  • Tanh
  • Relu
  • Softmax

The activation function decides if a neuron should be activated or not by calculating the weighted sum and further adding bias. The purposed of the activation function is to introduce a non linearity to the output of the neuron.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain learning of perceptron?

A

4 steps:

  • Initializing weights and threshold
  • Provide input and calculate the output
  • Update the weights
  • Repeat steps 2 and 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the significance of cost or loss function?

A

A cost function is a measure of the accuracy of the neural network with respect to a given training sample and expected output. It provides the performance of the neural network as a whole, in deep learning the goal is to minimize the cost function. For that we use the concept of gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is gradient descent? *

A

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are benefits of mini batch gradient descent?*

A

More efficient when compared to stochastic gradient descent. Generalization by finding the flat minima. Mini batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima. *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are steps for using gradient descent algorithms?

A

Steps:

  • Initialize random weight and bias
  • Pass an input through the network and get values from the output layer
  • Calculate the error between the actual value and the predicted value
  • Go to each neuron which contributes to the error and then change its respective values to reduce the error
  • Reiterate until you find the best weights of the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Create a gradient decent in python

A
# our weights 
params = [weights_hidden, weights_output, bias_hidden, bias_output]
#define function with sgd
def sgd(cost, params, lr=0.05):
grads = T.grad(cost=cost, wrt=params)
updates = []

for p, g in zip(params, grads):
updates.append([p, p- g * lr])

return updates

updates = sgd(cost, params)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the shortcomings of a single layer perceptron?

A

Single layer perceptron cannot classify non-linearly separable data points. They cannot solve complex problems that have a lot of parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a multi-layer perceptron?

A

Is a deep artificial neural network that is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP

  • Input nodes
  • Hidden nodes
  • Output nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What exactly is data normalization and why do we need it?

A

Is a very important pre processing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general it boils down to subtracting the mean of each data point and dividing by its standard deviation so that we have normally distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are better, deep NN’s or shallow ones?

A

At every level a network learns more and new and more abstract representation of the input. Deeper networks work better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is weight initialization in a deep NN?

A

Bad initialization can prevent a NN from learning and a good one can speed up the rate of convergence and a better overall error. The rule is to set the weights close to zero without being too small

17
Q

What is the difference between a feed forward and back propagation NN?

A

Feed forward: connections are fed forward and do not form cycles

Back Propagation: Consists of two steps, the first is feed forwarding the values and the second is to calculate the error and propagate it back to the earlier layers

18
Q

What are hyperparameters? Name a few in any NN

A

Hyperparameters are the variables which determine the network structure and the variables which determine how the network is trained (eg. Learning Rate, number of epochs, batch size)

19
Q

Explain the different hyper parameters related to network and training?

A

Network Hyperparameters:

  • Number of hidden layers: the layers between the input and the output layers
  • Network weight initialization: mostly uniform weight distributions are used
  • Activation function: are used to introduce non linearity to the models, which allows deep learning models to learn non linear prediction boundaries. Generally the rectifier activation function (ReLu) is the most popular.

Training Parameters:

  • Learning Rate: Defines how quickly a network updates it parameter. Low learning rate slows down the learning process but converts smoothly. A larger learning rate speeds up learning rate but may not converge as smooth as low LR. Usually a decaying learning rate is preferred to get best of both worlds and best expected output.
  • Momentum: helps identify direction of next step with knowledge of previous step. Helps prevent Oscillation and is typically between 0.5-0.9
  • Number of Epochs: Number of times the network is shown the training data. High numbers can lead to overfitting.
  • Batch size: number of samples given to the network after updating parameters, usually in 32, 16 or 64 (Is arbitrary number)
20
Q

What is dropout?

A

Is a regularization technique to avoid overfitting, which is to increase the validation accuracy, thus increasing the generalizing power.
Generally we use a small dropout value of 20%-50% of neurons. A value too low has minimal effect and a value too high results in under learning by the network.

21
Q

In training a NN you notice that the loss does not decrease in the few starting epochs. What could be the reason?

A
  • The learning rate is too low
  • Regularization parameter is too high
  • Stuck at local minima
22
Q

Name a few deep learning frameworks

A
  • Tensorflow
  • Pytorch
  • Keras
  • CNTK
  • Caffe
  • Chainer
23
Q

What are tensors?

A

Tensors are nothing but a de facto representation of the data in deep learning. They are multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.

24
Q

List a few advantages of Tensorflow

A
  • It has platform flexibility
  • It is easily trainable on CPU as well as GPU for distributed computing
  • Tensorflow has auto differentiating capabilities
  • It has advanced support for threads, asynchronous computation.
  • It is customizable and open source
25
Q

What is computational graph?

A

Is a series of Tensorflow operations arranged as nodes in the graph. Each node takes zero or more tensors as input and produces a tensor as output

26
Q

What is a CNN?

A

Is a class of deep learning NN’s, most commonly applied to analyzing visual imagery. Unlike NN, where the input is a vector, here the input is a multi channeled image. CNN’s use a variation of multilayer perceptron’s designed to require minimal pre-processing

27
Q

What are the different layers of a CNN?

A

Convolution: This layer comprises of a set of independent filters.
ReLu: This layer is used with the convolutional layer.
Pooling: Its function is to progressively reduce the spatial size of the representation to reduce the number of parameters and computation in the network.
Full Connectedness: Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular NN’s.

28
Q

What is an RNN?

A

Recurrent NN’s are a type of artificial NN designed to recognize patterns in sequences of data, such as text, genomes, handwriting, the spoken word, numerical times series data.
RNN’s use backpropagation algorithm for training because of their internal memory. RNN’s are able to remember important things about the input they receive, which enables them to be very precise in predicting what’s coming next.

29
Q

What are some issues faced when training an RNN?

A

Recurrent NN’s use backpropagation algorithm for training but it is applied for every timestamp. It is commonly known as Back Propagation Through Time (BTT). Some issues in this process are vanishing gradient and exploding gradient.

30
Q

What is vanishing gradient, how is this harmful?

A

In such methods, each of the neural network’s weights receives an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

31
Q

What is exploding gradient?

A

Are a problem when large error gradients accumulate and result in very large updates to NN model weights during training.
Gradient Descent process works best when these updates are small and controlled.
When the magnitudes of the gradients accumulate, an unstable network is likely to occur, which can cause poor prediction of results or even a model that reports nothing useful.

32
Q

Explain the importance of LSTM

A

Long Short Term Memory is an artificial recurrent neural network architecture used in the field of deep learning.
Unlike standard feedforward NN’s, LSTM has feedback connections that make it a general purpose computer.
It can not only process single data points, but also entire sequences of data.
They are a special kind of Recurrent NN’s which are capable of learning long-term dependencies.

33
Q

Explain Autoencoders and its uses?

A

An autoencoder NN is an unsupervised machine learning algorithm that applies backpropagation, setting the target values to be equal to the inputs.
Autoencoders are used to reduce the size of our inputs into a smaller representation.
If anyone needs the original data, they can reconstruct it from compressed data.

34
Q

In terms of dimensionality reduction, how does Autoencoder differ from PCA?

A

It is more efficient to learn several layers with an Autoencoder rather than learn one huge transformation with PCA.
An Autoencoder provides a representation of each layer as the output.
It can make use of pre-trained layers from another model to apply transfer learning to enhance the encoder/decoder

35
Q

What is Autoencoding used for?

A
  • Image coloring: converting any black and white image back to colored
  • Feature Variation: Extracts only required features of an image and generates the output by removing any noise
  • Dimensionality reduction: The reconstructed image is the same as our input but with reduced dimensions
  • Denoising image
36
Q

What are the layers of the Autoencoder?

A
  • Encoder: This part of the network compresses the input into a latent space representation
  • Code: This part of the network represents the compressed input which is fed to the decoder
  • Decoder: This layer decodes the encoded image back to the original dimension
37
Q

What is a Restricted Boltzmann Machine?

A

It is an undirected graphical model that plays a major role in Deep Learning Framework in recent times.
It is an algorithm which is useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modelling.