Basic concepts in Machine Learning Flashcards
Explain the three types of machine learning
Supervised: use labelled data to predict outputs from inputs.
Unsupervised: learn structure from unlabelled data.
Reinforcement: software taking actions to mazimise cumulative reward.
Explain what regression and classification are and give examples of each
Regression: learning function mapping inputs to IR. E.g. predict heights of things, house prices etc…
Classification: learning function mapping inputs to discrete outputs (membership to a class). e.g. predict dog vs cat, digit recognition.
What are the main challenges facing machine learning?
Low quality and quantity of data
Non-stationary data
overfitting/underfitting
Define overfitting and underfitting
Overfitting is learning the training dataset too well so that the model fails to generalise.
Underfitting is too general a prediction which doesn’t capture the dependencies of the data.
What are the input space, outcome space and action space in statistical learning?
Input space: set of possible inputs, dimensionality = number of features.
Outcome space: where the outcome labels come from: IR or {0, 1} etc.
Actio n space: space of predictions. Not always outcome space e.g. predicting a probability of membership to class.
What is a loss function?
L : YxA -> IR is a function which should be stationary when the prediction is equal to the intended outcome (ideally minimum). It is used to penalise poor predictions.
Give two examples of loss functions for regression and one for classification
SE loss : (y-yhat)**2
AE loss : |y-yhat|
logloss : -(ylog(y\at) + (1-y)log(1-yhat))
How do SEL and AEL hold up when it comes to outliers in the data?
AE is less sensitive to outliers, i.e. penalises mistakes less.
Define the risk functional for a given loss function
The expected loss when using f as a prediction function.
R = E[L(Y, f(X))]
Define the Bayes’ prediction function
The Bayes is the function that extremises the risk functional
Can we usually find this?
No, this is not what ML models are.
Show that the Bayes’ prediction function for SEL is the mean.
See notes!
What is the Bayes’ prediction function for AEL?
The median (See notes!)
Define the empirical risk functional and the empirical risk minimiser
The empirical risk functional is the average loss over the training data. The minimiser is the function that minimises this functional.
What can we do to avoid overfitting?
We could constrain our hypothesis space: i.e. try to extremise the ER subject to being in some constrained function space.
How can you ensure the empirical risk converges to the true risk with the number of data points.
Evaluate the risk on an independent test set.
Define the constrained risk minimiser
The function within the constrained hypothesis space which minimizes the risk functional
Define the constrained empirical risk minimiser
’’ empirical risk functional
Define the excess risk. How does this decompose?
ER = R[f_constrained] - R[f_bayes].
It decomposes into an estimation error and approximation error.
Explain the trade-off that occurs between these components as we increase the “size” of our hypothesis space.
Increasing the “size” of H decreases the approximation error since the CRM is closer to the Bayes, but increases the estimation error since our prediction function is less likely to be optimal.
For what loss function are bias and variance defined?
SE loss
What do the bias and variance each show?
bias gives a measure of average difference between the prediction function and the Bayes.
variance gives a measure of sensitivity to the training set.