Lecture 3 Flashcards
Regression
Using supervised learning to predict continuous outputs
Cost Function
Tells us how well our model approximates the training examples
A Model
A function that represents the relationship between x and y
Optimization
A way of finding parameters for the model while minimizing the loss function
Linear Regression
aka ordinary least squares;
Given input feature x we predict output y;
Follows basic y = mx + e equation with m as the parameter and e the measurement or other noise;
supervised learning
Why use Least Squares Linear Regression?
It minimizes the squared distance between measurements and regression line and is easy to compute (even by hand)
What do you do when your least squares minimization line doesn’t pass through the origin?
Introduce a bias term (intercept labeled as ‘b’).
This gets you y = mx+b+e
Curve Fitting
Finding a mathematical function (constructing a curve) that has the best fit on a series of data points
Smoothing
When you don’t look for an exact fit but for a curve that fits the data approximately
Perfect Fit
Fits the data precisely; goes through all data points
Best Fit
May not be the perfect fit; should give the best predictive value
Fit
The accuracy of a predictive model;
the extent to which predicted values of a target variable are close to the observed values of that variable
R^2
How the fit is expressed for regression models;
The percentage of variance explained by the model
Underfitting Performance
Performs badly on both training and validation sets
Overfitting Performance
Performs better on the training set than on the validation set
How do you tune hyperparameters?
- Divide training examples into training and validation sets.
- Use the training set to estimate the coefficients m for different types of hyperparameters (degree of the polynomial)
- The validation set estimates the best degree of the polynomial by evaluating how well the classifier does on this validation set
- Test how well it generalizes on unseen data
Regularization
The reduction of the magnitude of the coefficients;
Helps to prevent overfitting by limiting the variation of the parameters which prevents extreme fits to the training data
Ridge Regression
- Reduces model complexity by coefficient shrinkage.
- The penalty term is controlled by alpha
- The higher the value of alpha, the bigger the penalty. Therefore, the magnitude of coefficients are reduced
Least Absolute Shrinkage Selector Operator (Lasso) Regression
The magnitude of the coefficients reduce at even small values of alpha;
Lasso reduces some of the coefficients to zero (known as feature selection) which is absent in ridge regression
Feature Selection
Reducing some of the coefficients to zero.
Present in Lasso but not Ridge regression
Gradient Descent
An iterative optimization algorithm for finding the local minimum of a function. To find the local minimum of a function using gradient descent, we must take steps proportional to the negative of the gradient (move away from the gradient) of the function at the current point.
The coefficients of the least squares regression line are determined by minimizing the sum of the squares of ____?
residuals (y-coordinates - slope*x-coordinates)
What do the outputs of a linear regression look like if we have 1, 2, or more than two inputs?
The output of a one-dimensional input is a line (y = w1x1+ b).
The output of a two-dimensional input is a plane (y = w1x1 + w2x2. + b).
The output of multi-dimensional inputs (more than two) is a hyperplane
How is the residual variance computed?
The residual variance is computed as the sum of the squares of the y‐coordinates from the data minus the y‐coordinates predicted by linear regression.
See slide 15 of the lecture notes
Is linear regression sensitive to outliers?
Yes, the slope of the regression line will change due to outliers in most of the cases. Therefore, linear regression is sensitive to outliers.