Lecture 4 Flashcards
What is the difference between classification and regression tasks?
Classification is when the output is one of a finite set of values. Regression is when the output is a measured integer or real number.
- Classification = sunny/cloudy/rainy or true/false
- Regression = tomorrow’s temperature
How do we choose a GOOD hypothesis space?
Choose the one that fits your model the best.
What is induction?
Going from a specific set of observations to a general rule. We assume that we can apply our model to future cases (e.g., image recognition). NOTE: Inductive conclusions can be incorrect.
What is a deductive conclusion?
Conclusions that are guaranteed to be correct if the premises are correct.
What are the 3 types of Learning?
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
How do we choose a hypothesis space?
If you don’t have some prior knowledge about the process that generated the data, you perform exploratory data analysis to determine which hypothesis space is appropriate. (Or use trial and error.)
How do we choose a good hypothesis from within the hypothesis space?
Look for a best-fit function for which each h(xi) is close to yi, which is the case if h generalizes well with the test set.
How can we perform exploratory data analysis?
Examine the data with statistical tests and visualizations (such as histograms, scatter plots, box plots, etc.)
What is the true measure of a hypothesis?
How well it handles input it has not yet seen (e.g., test set), not how it does on the training set.
Define bias.
The tendency of a predictive hypothesis to deviate from the expected value when averaged over different training sets.
// or //
A model’s tendency to resist change. High bias == highly resistant to change (e.g. linear model).
Define variance.
The model’s magnitude of change.
The amount of change in the hypothesis due to fluctuation in the training data.
When is a hypothesis underfitting?
When it fails to find a pattern in the data.
When is a hypothesis overfitting?
When it performs poorly on unseen data because it pays too much attention to a particular data set it is trained on
Bias-variance tradeoff
A choice between:
1. more complex, low-bias hypotheses that fit the training data well
2. simpler, low-variance hypotheses that may generalize better
What is Ockham’s Razor Principle?
Choose the simplest hypothesis that matches the data because there is often a bias-variance tradeoff.
Regarding decision trees, what is the most important attribute?
The one that makes the most difference to the classification of an example.
What is entropy?
A measure of the uncertainty of a random variable. (e.g., a loaded coin that always lands on heads has entropy of 0)
What if we don’t have enough data to make all three of the data set splits?
You can use k-fold cross-validation.
In your own words, explain how the Gradient Descent algo works.
There is a for-loop, and for every iteration you will modify your weights by slighty increasing or decreasing them, until you get to a point where it is optimal.
Why would we want a machine to learn?
- The designers cannot anticipate all possible future siutations
- Sometimes the designers have no idea how to program a solution
Classification learning problem
When the output is one of a finite set of values (e.g., boolean, sunny/rainy)
Regression learning problem
When the output is a number (integer or real number)
Supervised Learning
The agent observes input-output pairs and learns a function that maps from input to output (aka Labels). There is always an expected label for each processed input.
Unsupervised Learning
The agent learns patterns in the input without any explicit feedback.
(The expectations are lower than in supervised learning)
Reinforcement Learning
The agent learns from a series of reinforcements known as rewards and punishments. There are no explicit labels; instead, it has critics.
(e.g., chess game: agent won = reward, agent lost = punishment)
What is the Ground Truth?
Output yi - the true answer we’re asking our model to predict
Why not let H be the class of all computer programs, or all Turing machines?
- There is a tradeoff between the expressiveness of a hypothesis space and the computational complexity of finding a good hypothesis within that space
- Simpler hypothesis spaces should be preferred because we want to use h after it’s learned
Decision Boundary
A line (or a surface, in higher dimensions) that separates two classes.
(In linear regression, this is a straight line referred to as a linear separator)
When is data linearly separable?
When it admits a linear decision boundary (linear separator)
What is a parametric model?
A learning model that summarizes data with a set of parameters of fixed size.
What is a non-parametric model?
A learning model that cannot be characterized by a bounded set of parameters.
(e.g., instance-based or memory-based learning)