Basic concepts in Machine Learning Flashcards

1
Q

Explain the three types of machine learning

A

Supervised: use labelled data to predict outputs from inputs.
Unsupervised: learn structure from unlabelled data.
Reinforcement: software taking actions to mazimise cumulative reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain what regression and classification are and give examples of each

A

Regression: learning function mapping inputs to IR. E.g. predict heights of things, house prices etc…
Classification: learning function mapping inputs to discrete outputs (membership to a class). e.g. predict dog vs cat, digit recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main challenges facing machine learning?

A

Low quality and quantity of data
Non-stationary data
overfitting/underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define overfitting and underfitting

A

Overfitting is learning the training dataset too well so that the model fails to generalise.
Underfitting is too general a prediction which doesn’t capture the dependencies of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the input space, outcome space and action space in statistical learning?

A

Input space: set of possible inputs, dimensionality = number of features.
Outcome space: where the outcome labels come from: IR or {0, 1} etc.
Actio n space: space of predictions. Not always outcome space e.g. predicting a probability of membership to class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a loss function?

A

L : YxA -> IR is a function which should be stationary when the prediction is equal to the intended outcome (ideally minimum). It is used to penalise poor predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give two examples of loss functions for regression and one for classification

A

SE loss : (y-yhat)**2
AE loss : |y-yhat|
logloss : -(ylog(y\at) + (1-y)log(1-yhat))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do SEL and AEL hold up when it comes to outliers in the data?

A

AE is less sensitive to outliers, i.e. penalises mistakes less.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the risk functional for a given loss function

A

The expected loss when using f as a prediction function.

R = E[L(Y, f(X))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the Bayes’ prediction function

A

The Bayes is the function that extremises the risk functional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can we usually find this?

A

No, this is not what ML models are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Show that the Bayes’ prediction function for SEL is the mean.

A

See notes!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Bayes’ prediction function for AEL?

A

The median (See notes!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define the empirical risk functional and the empirical risk minimiser

A

The empirical risk functional is the average loss over the training data. The minimiser is the function that minimises this functional.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can we do to avoid overfitting?

A

We could constrain our hypothesis space: i.e. try to extremise the ER subject to being in some constrained function space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you ensure the empirical risk converges to the true risk with the number of data points.

A

Evaluate the risk on an independent test set.

17
Q

Define the constrained risk minimiser

A

The function within the constrained hypothesis space which minimizes the risk functional

18
Q

Define the constrained empirical risk minimiser

A

’’ empirical risk functional

19
Q

Define the excess risk. How does this decompose?

A

ER = R[f_constrained] - R[f_bayes].

It decomposes into an estimation error and approximation error.

20
Q

Explain the trade-off that occurs between these components as we increase the “size” of our hypothesis space.

A

Increasing the “size” of H decreases the approximation error since the CRM is closer to the Bayes, but increases the estimation error since our prediction function is less likely to be optimal.

21
Q

For what loss function are bias and variance defined?

A

SE loss

22
Q

What do the bias and variance each show?

A

bias gives a measure of average difference between the prediction function and the Bayes.
variance gives a measure of sensitivity to the training set.