Statistical Learning and Intro to ML Flashcards
What is machine learning?
The field of study that gives a computer the ability to learn without being explicitly programmed.
A program is said to learn a task T from an experience E if its performance P improves with its experience E.
Why is machine learning taking off now?
At present there is more: Data Computing Power Better models Available
What are some applications for the following ML models:
- Predictions
- Recommender systems
- Image Processing
- Natural Language
- Generation
- Predictions
a. Weather/temperature
b. Traffic
c. Consumer Interest
d. Stocks
e. Diagnosis
f. Loans/ Insurance - Recommender systems
a. Purchasing Behaviour
b. Viewing Behaviour - Image Processing
a. Classification
b. Object Identification - Natural Language
a. Processing
b. Sentiment Analysis
c. Translation - Generation
a. Music
b. Natural Language
What are three three types of machine learning and how do they differ?
Supervised
These are algorithms that learn by mapping inputs to outputs.
• Regression: Predict an output (usually an output number). (House prices, football scores)
• Classification: Predict membership to a class. (What genre a song is, or the following:)
Unsupervised
These are algorithms that learn structures without labels. This is more like looking for patterns. This is for when there is no correct answer. For example, it can be used to look for clusters of data points
Reinforcement
How software takes actions to maximise some notion. So you might give a positive reward for winning a game and a negative reward for loosing, but you wont tell it how to get from one to the other.
What are some common problems and obstacles to machine learning?
- Insufficient quantity of data
- A non-representative training set
- Poor data quality
- Bad or missing features
- Non stationarity (The correct answer changes all the time)
- Covariate drift (where the inputs change with time)
- Overfitting or underfitting
- Applying the wrong algorithm for the problem
- Having insufficient computing power available
What are the three spaces in machine learning? Describe what they do.
The inputs are in the input space
The outputs are in the output space
And the machine learning all happens in the action space
Why is the input X and the output y?
X is a matrix of inputs:
Feature 1 Feature 2 1 2 3 4 ... ... y is the vector of outputs
a b c d ...
What is a loss function?
Determines the penalty on a machine learning algorithm based on how well or how badly it does.
They take many forms.
What is risk?
What is a risk functional?
Risk is a way of gauging the average loss that a function gives.
The expected value of the loss function
How do you calculate the expected value?
Discrete: The sum of x times f(x) over the number of x
Continuous: The integral of x times f(x) dx
What is the Bayes function?
The risk functional which minimises the risk.
Denoted f*
What is the empirical risk functional and why do we use it?
The average of the loss function on all datapoints.
We use it because we need some risk function that we can actually compute.
Describe the Empirical Risk Function Minimiser
The function which minimises the value of the empirical risk (analogous to the bayes function)
What does it mean to constrain a function?
Constraining means reducing the set of possible functions that are allowed. An example might be only allowing linear functions.
What is the constrained empirical risk minimiser?
Risk is a way to gauge the performance of a function with reference to its loss.
The empirical risk is the average loss over the data.
The set of possible functions is constrained to a certain set.
The minimiser is the function which finds the minimum constrained empirical risk.