Machine Learning Flashcards

1
Q

What are CNNs good for?

A

Image recognition, because CNNs can identify and extract features from images for classification. They are also good with any sort of data with spatial structure. CNN architectures do typically incorporate fully connected layers for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are FCNNs good for?

A

Fully connected neural networks are good for classification and more general purpose tasks (structure agnostic), but tedious for images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens if you add additional layers to a neural network?

A

You get more feature extraction up to a point, where you then get overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to tell if you are overfitting the training data?

A

You have good performance on the training data, but not good performance on evaluation data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to improve performance when overfitting?

A

Increase regularization or number of examples of training data, or decrease the amount of features used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an epoch?

A

An epoch is one complete instance of passing the training data through the model. After a second epoch, the same data can be sent in with updated weights in order to improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is batch normalization?

A

Batch normalization is a technique that is intended to solve the problem of internal covariate shift, which states that the inputs may not be distributed the same way every time the weights are updated, which means that training will take much longer since you’re “chasing a moving target”. In batch normalization, each input variable to a layer is normalized to have the same mean and standard deviation in order to try to have the same distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros and cons of batch normalization?

A

Pros: Mostly good. Leads to faster convergence, decreases the importance of initial weights, and requires less data for generalization. Cons: It is not good when using sufficiently small batch sizes (accuracy of mean and variance decreases with smaller batch sizes), and makes test data more different from training data (since test data is batch 1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is regularization?

A

Regularization is a method for reducing overfitting. It is a regression technique that simplifies a model by shrinking coefficient estimates towards zero. Adds a penalty term to coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is LSTM stand for, what does biLSTM mean, and what are these networks good for?

A

Long short-term memory, or bi-directional LSTM. Used primarily for sequences of data and dependencies between neighboring entries in the sequence. Bi-directional looks at both forward and backward relations (essentially is an additional LSTM layer going the opposite direction). Most useful for natural language processing or other speech tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens in cross-validation?

A

Cross-validation is splitting your data into three parts, splitting the data into k subsets, training on k-1, and testing on the final one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bias vs variance in ML?

A

Bias = simplifying assumptions to more easily approximate the target function (like CNN for images)
Variance = amount that target function will change given new training data.
Too much variance : overfitting. Too much bias : underfitting. Try to find the trade-off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an activation function?

A

An activation function takes in any input and maps it to a corresponding output between {0,1} (or maybe {-1, 1}).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are activation functions nonlinear?

A

The decision boundary needs to be nonlinear to fully account for the non-linear combinations of weights and inputs in classification (which is true in most cases). Multiple hidden layers will just collapse into one if the network is linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When to use ReLU and when to use softmax?

A

Softmax is primarily used as the output layer of neural networks, especially in cases of multi-class classification. Relu is better for hidden layers and is very computationally cheap (tanh and sigmoid are complex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the least squares method?

A

The least square methods is a regression technique for approximating a solution to an unknown overdetermined equation. It finds the optimal parameters by minimizing the sum of squared residuals. Uses a quadratic loss function.

17
Q

What does a pooling layer do?

A

A pooling layer reduces the dimensions of a hidden layer by combining outputs of clusters from the previous layer into a single neuron in the next layer.