Final Exam Prep Flashcards

1
Q

What is density estimation?

A

Making an estimate by trying to fit a probability density function to some data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is occam’s razor?

A

The simplest explanation is usually right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Overfitting will give a low training set error but a high test set error. True or false?

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Underfitting will give a high training set error and a high test set error. True or False?

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does a underfitted model have high or low bias?

A

High bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is generalization?

A

Generalization is the ability of a trained model to classify new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is overfitting?

A

Overfitting is when the model fits the data too closely and doesn’t generalize to new data. It also captures noise.
Avoid overfitting by cross-validation, early stopping, pruning - further training may hinder generalisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is underfitting?

A

Underfitting is when the model doesn’t fit the data. It doesn’t capture the underlying pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is regularlisation?

A
  • A technique used to penalise overfitting.

- Applies a penalty to the cost function to take into account any outliers that have made the model more complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is linear regression?

A

A line fitted to data points. Line is placed to limit the SSE. Used for continuous problems with an independent and dependent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is logistic regression?

A

A classification regression model. It takes inputs into a log function and squashes it. And outputs it as 0 or 1 etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is optimisation in machine learning?

A

Finding the right parameters in order to minimise a cos t function. A popular one is gradient descent which looks for local minima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the learning rate?

A

How steep the parameters are changing in gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the curse of dimensionality?

A

The more dimensions (features) you have, the more data you need to build good models (generalize). More dimensions (features) doesn’t exactly mean better classification. There is an optimum number, beyond this you can get overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the bias/variance tradeoff?

A

Hard for a model to be complex enough and simple enough at the same time. (underfitting/overfitting balance).
Simple model = high bias and low variance
Complex model = low bias and high variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is backpropogation?

A

Basically, adjust all the weights and the biases on the way back after forward propagating. You do this until you find weights and biases that minimise a cost function.

17
Q

What is feature/subset selection?

A

Using/Choosing only a number of features from the dataset to construct a model.

18
Q

What is feature extraction?

A

Feature extraction is a general term for methods of constructing combinations of features to provide insight into a dataset.

19
Q

What is MLE?

A

A method for determining the parameters of a model.

Done by checking that the parameters most likely match the observed data points.

20
Q

What is Bias in ML?

A

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict

21
Q

What is Variance in ML?

A

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data.

22
Q

How is Bayes Rule used in ML?

A

A probabilistic method for classifying new data based on previous examples.

23
Q

What are discriminant functions?

A

Basically, classifying based on a function.

24
Q

What are decision rules?

A

Basically, a lot if - then statements for classifying.

25
Q

What is non-parametric classification?

A

Doesn’t make assumptions about the underlying model. Good when you have no prior knowledge and don’t want to choose the right features

26
Q

What is NN-classification?

A

Set a k-value and classify based on the k-nearest neighbours. If the majority of them are red, then it’s classified as red.

27
Q

What’s the difference between discriminative and generative learning?

A

A Discriminative model ‌models the decision boundary between the classes and learns the conditional probability distribution p(y|x).

A Generative Model ‌explicitly models the actual distribution of each class and learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem [1]

28
Q

What are kernel density estimators?

A

Smoothing function used to estimate distributions
Histogram bins are sharp, they don’t take into account which side of the histogram column they belong to. KDEs are smooth, take into account this.

29
Q

What is a mixture model?

A

Clustering method that uses Gaussian distribution to best fit unclassified data.
You have multiple peaks in the data
You use when you don’t have perfectly circular data
Uses the EM algorithm to move the parameters of the Gaussian
Soft clustering vs k-means hard clustering

30
Q

What are autoencoders?

A

Autoencoders are a type of neural network that attempts to output it’s own input i.e. learn the identity function.

It does this in two parts, encoding and decoding. In the encoding phase it tries to compress the features. And the decoding phase it tries to uncompress.

31
Q

Why are autoencoders used?

A
  • The most common is finding a more suitable representation of the input data which we can use in a neural network for classification for example.
  • Outlier detection
32
Q

What is LDA?

A
  • Focuses on maximising separability among features.
  • Find new axis and project data onto new axis in a way that maximises separation
    => Maximise distance between means of all classes
    => Minimise variation of each class
33
Q

What is k-Nearest Neighbour Classification?

A
- Make prediction by finding the most common class that corresponds to the k nearest training points 
 => Hyper Parameters: Distance metric and k (number of neighbours)
34
Q

What is Model Selection?

A
  • Indication of overfitting if the test error is much greater than the training error.
  • Simple model => poor predictive power on the test set
  • Complex model = > too specific to the training set, therefore have high error on test set