Final Exam Prep Flashcards by Ben Rodgers

What is density estimation?

Making an estimate by trying to fit a probability density function to some data.

How well did you know this?

Not at all

Perfectly

What is occam’s razor?

The simplest explanation is usually right.

How well did you know this?

Not at all

Perfectly

Overfitting will give a low training set error but a high test set error. True or false?

True.

How well did you know this?

Not at all

Perfectly

Underfitting will give a high training set error and a high test set error. True or False?

True.

How well did you know this?

Not at all

Perfectly

Does a underfitted model have high or low bias?

High bias.

How well did you know this?

Not at all

Perfectly

What is generalization?

Generalization is the ability of a trained model to classify new data.

How well did you know this?

Not at all

Perfectly

What is overfitting?

Overfitting is when the model fits the data too closely and doesn’t generalize to new data. It also captures noise.
Avoid overfitting by cross-validation, early stopping, pruning - further training may hinder generalisation.

How well did you know this?

Not at all

Perfectly

What is underfitting?

Underfitting is when the model doesn’t fit the data. It doesn’t capture the underlying pattern.

How well did you know this?

Not at all

Perfectly

What is regularlisation?

A technique used to penalise overfitting.

- Applies a penalty to the cost function to take into account any outliers that have made the model more complex

How well did you know this?

Not at all

Perfectly

What is linear regression?

A line fitted to data points. Line is placed to limit the SSE. Used for continuous problems with an independent and dependent variables.

How well did you know this?

Not at all

Perfectly

What is logistic regression?

A classification regression model. It takes inputs into a log function and squashes it. And outputs it as 0 or 1 etc.

How well did you know this?

Not at all

Perfectly

What is optimisation in machine learning?

Finding the right parameters in order to minimise a cos t function. A popular one is gradient descent which looks for local minima.

How well did you know this?

Not at all

Perfectly

What is the learning rate?

How steep the parameters are changing in gradient descent.

How well did you know this?

Not at all

Perfectly

What is the curse of dimensionality?

The more dimensions (features) you have, the more data you need to build good models (generalize). More dimensions (features) doesn’t exactly mean better classification. There is an optimum number, beyond this you can get overfitting.

How well did you know this?

Not at all

Perfectly

What is the bias/variance tradeoff?

Hard for a model to be complex enough and simple enough at the same time. (underfitting/overfitting balance).
Simple model = high bias and low variance
Complex model = low bias and high variance

How well did you know this?

Not at all

Perfectly

What is backpropogation?

Study These Flashcards

Basically, adjust all the weights and the biases on the way back after forward propagating. You do this until you find weights and biases that minimise a cost function.

What is feature/subset selection?

Study These Flashcards

Using/Choosing only a number of features from the dataset to construct a model.

What is feature extraction?

Study These Flashcards

Feature extraction is a general term for methods of constructing combinations of features to provide insight into a dataset.

What is MLE?

Study These Flashcards

A method for determining the parameters of a model.

Done by checking that the parameters most likely match the observed data points.

What is Bias in ML?

Study These Flashcards

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict

What is Variance in ML?

Study These Flashcards

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data.

How is Bayes Rule used in ML?

Study These Flashcards

A probabilistic method for classifying new data based on previous examples.

What are discriminant functions?

Study These Flashcards

Basically, classifying based on a function.

What are decision rules?

Study These Flashcards

Basically, a lot if - then statements for classifying.

What is non-parametric classification?

Doesn’t make assumptions about the underlying model. Good when you have no prior knowledge and don’t want to choose the right features

What is NN-classification?

Set a k-value and classify based on the k-nearest neighbours. If the majority of them are red, then it's classified as red.

What's the difference between discriminative and generative learning?

A Discriminative model ‌models the decision boundary between the classes and learns the conditional probability distribution p(y|x). A Generative Model ‌explicitly models the actual distribution of each class and learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem [1]

What are kernel density estimators?

Smoothing function used to estimate distributions Histogram bins are sharp, they don’t take into account which side of the histogram column they belong to. KDEs are smooth, take into account this.

What is a mixture model?

Clustering method that uses Gaussian distribution to best fit unclassified data. You have multiple peaks in the data You use when you don’t have perfectly circular data Uses the EM algorithm to move the parameters of the Gaussian Soft clustering vs k-means hard clustering

What are autoencoders?

Autoencoders are a type of neural network that attempts to output it’s own input i.e. learn the identity function. It does this in two parts, encoding and decoding. In the encoding phase it tries to compress the features. And the decoding phase it tries to uncompress.

Why are autoencoders used?

- The most common is finding a more suitable representation of the input data which we can use in a neural network for classification for example. - Outlier detection

What is LDA?

- Focuses on maximising separability among features. - Find new axis and project data onto new axis in a way that maximises separation => Maximise distance between means of all classes => Minimise variation of each class

What is k-Nearest Neighbour Classification?

``` - Make prediction by finding the most common class that corresponds to the k nearest training points => Hyper Parameters: Distance metric and k (number of neighbours) ```

What is Model Selection?

- Indication of overfitting if the test error is much greater than the training error. - Simple model => poor predictive power on the test set - Complex model = > too specific to the training set, therefore have high error on test set

Final Exam Prep Flashcards

(34 cards)