Neural Networks Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What kind of problems neural nets can solve? 👶

A

Neural nets are good at solving non-linear problems. Some good examples are problems that are relatively easy for humans (because of experience, intuition, understanding, etc), but difficult for traditional regression models: speech recognition, handwriting recognition, image identification, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does a usual fully-connected feed-forward neural network work? ‍⭐️

A

In a usual fully-connected feed-forward network, each neuron receives input from every element of the previous layer and thus the receptive field of a neuron is the entire previous layer. They are usually used to represent feature vectors for input data in classification problems but can be expensive to train because of the number of computations involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we need activation functions? 👶

A

The main idea of using neural networks is to learn complex nonlinear functions. If we are not using an activation function in between different layers of a neural network, we are just stacking up multiple linear layers one on top of another and this leads to learning a linear function. The Nonlinearity comes only with the activation function, this is the reason we need activation functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the problems with sigmoid as an activation function? ‍⭐️

A

The derivative of the sigmoid function for large positive or negative numbers is almost zero. From this comes the problem of vanishing gradient — during the backpropagation our net will not learn (or will learn drastically slow). One possible way to solve this problem is to use ReLU activation function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ReLU? How is it better than sigmoid or tanh? ‍⭐️

A

ReLU is an abbreviation for Rectified Linear Unit. It is an activation function which has the value 0 for all negative values and the value f(x) = x for all positive values. The ReLU has a simple activation function which makes it fast to compute and while the sigmoid and tanh activation functions saturate at higher values, the ReLU has a potentially infinite activation, which addresses the problem of vanishing gradients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How we can initialize the weights of a neural network? ‍⭐️

A

roper initialization of weight matrix in neural network is very necessary. Simply we can say there are two ways for initializations.

Initializing weights with zeroes. Setting weights to zero makes your network no better than a linear model. It is important to note that setting biases to 0 will not create any troubles as non zero weights take care of breaking the symmetry and even if bias is 0, the values in every neuron are still different.
Initializing weights randomly. Assigning random values to weights is better than just 0 assignment.
a) If weights are initialized with very high values the term np.dot(W,X)+b becomes significantly higher and if an activation function like sigmoid() is applied, the function maps its value near to 1 where the slope of gradient changes slowly and learning takes a lot of time.
b) If weights are initialized with low values it gets mapped to 0, where the case is the same as above. This problem is often referred to as the vanishing gradient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What if we set all the weights of a neural network to 0? ‍⭐️

A

If all the weights of a neural network are set to zero, the output of each connection is same (W*x = 0). This means the gradients which are backpropagated to each connection in a layer is same. This means all the connections/weights learn the same thing, and the model never converges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What regularization techniques for neural nets do you know? ‍⭐️

A

L1 Regularization - Defined as the sum of absolute values of the individual parameters. The L1 penalty causes a subset of the weights to become zero, suggesting that the corresponding features may safely be discarded.
L2 Regularization - Defined as the sum of square of individual parameters. Often supported by regularization hyperparameter alpha. It results in weight decay.
Data Augmentation - This requires some fake data to be created as a part of training set.
Drop Out : This is most effective regularization technique for neural nets. Few random nodes in each layer is deactivated in forward pass. This allows the algorithm to train on different set of nodes in each iterations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly