Statistical Learning and Intro to ML Flashcards

Question 1

Q

What is machine learning?

Answer

A

The field of study that gives a computer the ability to learn without being explicitly programmed.

A program is said to learn a task T from an experience E if its performance P improves with its experience E.

Question 2

Q

Why is machine learning taking off now?

Answer

A

At present there is more:
Data
Computing Power
Better models
Available

Question 3

Q

What are some applications for the following ML models:

Predictions
Recommender systems
Image Processing
Natural Language
Generation

Answer

A

Predictions
a. Weather/temperature
b. Traffic
c. Consumer Interest
d. Stocks
e. Diagnosis
f. Loans/ Insurance
Recommender systems
a. Purchasing Behaviour
b. Viewing Behaviour
Image Processing
a. Classification
b. Object Identification
Natural Language
a. Processing
b. Sentiment Analysis
c. Translation
Generation
a. Music
b. Natural Language

Question 4

Q

What are three three types of machine learning and how do they differ?

Answer

A

Supervised
These are algorithms that learn by mapping inputs to outputs.
• Regression: Predict an output (usually an output number). (House prices, football scores)
• Classification: Predict membership to a class. (What genre a song is, or the following:)

Unsupervised

These are algorithms that learn structures without labels. This is more like looking for patterns. This is for when there is no correct answer. For example, it can be used to look for clusters of data points

Reinforcement

How software takes actions to maximise some notion. So you might give a positive reward for winning a game and a negative reward for loosing, but you wont tell it how to get from one to the other.

Question 5

Q

What are some common problems and obstacles to machine learning?

Answer

A

Insufficient quantity of data
A non-representative training set
Poor data quality
Bad or missing features
Non stationarity (The correct answer changes all the time)
Covariate drift (where the inputs change with time)
Overfitting or underfitting
Applying the wrong algorithm for the problem
Having insufficient computing power available

Question 6

Q

What are the three spaces in machine learning? Describe what they do.

Answer

A

The inputs are in the input space

The outputs are in the output space

And the machine learning all happens in the action space

Question 7

Q

Why is the input X and the output y?

Answer

A

X is a matrix of inputs:

Feature 1     Feature 2
       1                   2
       3                  4
       ...                 ...
y is the vector of outputs

a
b
c
d
...

Question 8

Q

What is a loss function?

Answer

A

Determines the penalty on a machine learning algorithm based on how well or how badly it does.
They take many forms.

Question 9

Q

What is risk?

What is a risk functional?

Answer

A

Risk is a way of gauging the average loss that a function gives.
The expected value of the loss function

Question 10

Q

How do you calculate the expected value?

Answer

A

Discrete: The sum of x times f(x) over the number of x
Continuous: The integral of x times f(x) dx

Question 11

Q

What is the Bayes function?

Answer

A

The risk functional which minimises the risk.

Denoted f*

Question 12

Q

What is the empirical risk functional and why do we use it?

Answer

A

The average of the loss function on all datapoints.

We use it because we need some risk function that we can actually compute.

Question 13

Q

Describe the Empirical Risk Function Minimiser

Answer

A

The function which minimises the value of the empirical risk (analogous to the bayes function)

Question 14

Q

What does it mean to constrain a function?

Answer

A

Constraining means reducing the set of possible functions that are allowed. An example might be only allowing linear functions.

Question 15

Q

What is the constrained empirical risk minimiser?

Answer

A

Risk is a way to gauge the performance of a function with reference to its loss.
The empirical risk is the average loss over the data.
The set of possible functions is constrained to a certain set.
The minimiser is the function which finds the minimum constrained empirical risk.

Question 16

Q

What are approximation and estimation error?

Answer

Study These Flashcards

A

Approximation error is the difference between the chosen function space and the ideal function.

Estimation error is the difference between the data that is available and the data that would be required to train a model to achieve the ideal function.

You can increase the size of the function space so as to reduce the approximation error, but this makes it much harder to find the right function within that space.

Question 17

Q

What is bias and what is variance?

Answer

Study These Flashcards

A

Bias: When functions are inflexible to the dataset and are biased towards what ever functions you chose in your hypothesis space.

Variance: When functions are so well fit to one set of data that they yield a completely different answer at the same point whenever different data is used.

Question 18

Q

What are bias and variance linked to?

Answer

Study These Flashcards

A

Bias is linked to underfitting and high approximation error.

Variance is linked to overfitting and high estimation error.

Statistical Learning and Intro to ML Flashcards

(18 cards)