Statistical Learning and Intro to ML Flashcards

1
Q

What is machine learning?

A

The field of study that gives a computer the ability to learn without being explicitly programmed.

A program is said to learn a task T from an experience E if its performance P improves with its experience E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is machine learning taking off now?

A
At present there is more:
Data
Computing Power
Better models
Available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some applications for the following ML models:

  1. Predictions
  2. Recommender systems
  3. Image Processing
  4. Natural Language
  5. Generation
A
  1. Predictions
    a. Weather/temperature
    b. Traffic
    c. Consumer Interest
    d. Stocks
    e. Diagnosis
    f. Loans/ Insurance
  2. Recommender systems
    a. Purchasing Behaviour
    b. Viewing Behaviour
  3. Image Processing
    a. Classification
    b. Object Identification
  4. Natural Language
    a. Processing
    b. Sentiment Analysis
    c. Translation
  5. Generation
    a. Music
    b. Natural Language
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three three types of machine learning and how do they differ?

A

Supervised
These are algorithms that learn by mapping inputs to outputs.
• Regression: Predict an output (usually an output number). (House prices, football scores)
• Classification: Predict membership to a class. (What genre a song is, or the following:)

Unsupervised

These are algorithms that learn structures without labels. This is more like looking for patterns. This is for when there is no correct answer. For example, it can be used to look for clusters of data points

Reinforcement

How software takes actions to maximise some notion. So you might give a positive reward for winning a game and a negative reward for loosing, but you wont tell it how to get from one to the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some common problems and obstacles to machine learning?

A
  1. Insufficient quantity of data
  2. A non-representative training set
  3. Poor data quality
  4. Bad or missing features
  5. Non stationarity (The correct answer changes all the time)
  6. Covariate drift (where the inputs change with time)
  7. Overfitting or underfitting
  8. Applying the wrong algorithm for the problem
  9. Having insufficient computing power available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the three spaces in machine learning? Describe what they do.

A

The inputs are in the input space

The outputs are in the output space

And the machine learning all happens in the action space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is the input X and the output y?

A

X is a matrix of inputs:

Feature 1     Feature 2
       1                   2
       3                  4
       ...                 ...
y is the vector of outputs
a
b
c
d
...
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a loss function?

A

Determines the penalty on a machine learning algorithm based on how well or how badly it does.
They take many forms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is risk?

What is a risk functional?

A

Risk is a way of gauging the average loss that a function gives.
The expected value of the loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you calculate the expected value?

A

Discrete: The sum of x times f(x) over the number of x
Continuous: The integral of x times f(x) dx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Bayes function?

A

The risk functional which minimises the risk.

Denoted f*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the empirical risk functional and why do we use it?

A

The average of the loss function on all datapoints.

We use it because we need some risk function that we can actually compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the Empirical Risk Function Minimiser

A

The function which minimises the value of the empirical risk (analogous to the bayes function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean to constrain a function?

A

Constraining means reducing the set of possible functions that are allowed. An example might be only allowing linear functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the constrained empirical risk minimiser?

A

Risk is a way to gauge the performance of a function with reference to its loss.
The empirical risk is the average loss over the data.
The set of possible functions is constrained to a certain set.
The minimiser is the function which finds the minimum constrained empirical risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are approximation and estimation error?

A

Approximation error is the difference between the chosen function space and the ideal function.

Estimation error is the difference between the data that is available and the data that would be required to train a model to achieve the ideal function.

You can increase the size of the function space so as to reduce the approximation error, but this makes it much harder to find the right function within that space.

17
Q

What is bias and what is variance?

A

Bias: When functions are inflexible to the dataset and are biased towards what ever functions you chose in your hypothesis space.

Variance: When functions are so well fit to one set of data that they yield a completely different answer at the same point whenever different data is used.

18
Q

What are bias and variance linked to?

A

Bias is linked to underfitting and high approximation error.

Variance is linked to overfitting and high estimation error.