Lecture 3 Flashcards by Andy Rice

Regression

Using supervised learning to predict continuous outputs

How well did you know this?

Not at all

Perfectly

Cost Function

Tells us how well our model approximates the training examples

How well did you know this?

Not at all

Perfectly

A Model

A function that represents the relationship between x and y

How well did you know this?

Not at all

Perfectly

Optimization

A way of finding parameters for the model while minimizing the loss function

How well did you know this?

Not at all

Perfectly

Linear Regression

aka ordinary least squares;

Given input feature x we predict output y;

Follows basic y = mx + e equation with m as the parameter and e the measurement or other noise;

supervised learning

How well did you know this?

Not at all

Perfectly

Why use Least Squares Linear Regression?

It minimizes the squared distance between measurements and regression line and is easy to compute (even by hand)

How well did you know this?

Not at all

Perfectly

What do you do when your least squares minimization line doesn’t pass through the origin?

Introduce a bias term (intercept labeled as ‘b’).

This gets you y = mx+b+e

How well did you know this?

Not at all

Perfectly

Curve Fitting

Finding a mathematical function (constructing a curve) that has the best fit on a series of data points

How well did you know this?

Not at all

Perfectly

Smoothing

When you don’t look for an exact fit but for a curve that fits the data approximately

How well did you know this?

Not at all

Perfectly

Perfect Fit

Fits the data precisely; goes through all data points

How well did you know this?

Not at all

Perfectly

Best Fit

May not be the perfect fit; should give the best predictive value

How well did you know this?

Not at all

Perfectly

Fit

The accuracy of a predictive model;

the extent to which predicted values of a target variable are close to the observed values of that variable

How well did you know this?

Not at all

Perfectly

R^2

How the fit is expressed for regression models;

The percentage of variance explained by the model

How well did you know this?

Not at all

Perfectly

Underfitting Performance

Performs badly on both training and validation sets

How well did you know this?

Not at all

Perfectly

Overfitting Performance

Performs better on the training set than on the validation set

How well did you know this?

Not at all

Perfectly

How do you tune hyperparameters?

Study These Flashcards

Divide training examples into training and validation sets.
Use the training set to estimate the coefficients m for different types of hyperparameters (degree of the polynomial)
The validation set estimates the best degree of the polynomial by evaluating how well the classifier does on this validation set
Test how well it generalizes on unseen data

Regularization

Study These Flashcards

The reduction of the magnitude of the coefficients;

Helps to prevent overfitting by limiting the variation of the parameters which prevents extreme fits to the training data

Ridge Regression

Study These Flashcards

Reduces model complexity by coefficient shrinkage.
The penalty term is controlled by alpha
The higher the value of alpha, the bigger the penalty. Therefore, the magnitude of coefficients are reduced

Least Absolute Shrinkage Selector Operator (Lasso) Regression

Study These Flashcards

The magnitude of the coefficients reduce at even small values of alpha;

Lasso reduces some of the coefficients to zero (known as feature selection) which is absent in ridge regression

Feature Selection

Study These Flashcards

Reducing some of the coefficients to zero.

Present in Lasso but not Ridge regression

Gradient Descent

Study These Flashcards

An iterative optimization algorithm for finding the local minimum of a function. To find the local minimum of a function using gradient descent, we must take steps proportional to the negative of the gradient (move away from the gradient) of the function at the current point.

The coefficients of the least squares regression line are determined by minimizing the sum of the squares of ____?

Study These Flashcards

residuals (y-coordinates - slope*x-coordinates)

What do the outputs of a linear regression look like if we have 1, 2, or more than two inputs?

Study These Flashcards

The output of a one-dimensional input is a line (y = w1x1+ b).
The output of a two-dimensional input is a plane (y = w1x1 + w2x2. + b).
The output of multi-dimensional inputs (more than two) is a hyperplane

How is the residual variance computed?

Study These Flashcards

The residual variance is computed as the sum of the squares of the y‐coordinates from the data minus the y‐coordinates predicted by linear regression.

See slide 15 of the lecture notes

Is linear regression sensitive to outliers?

Yes, the slope of the regression line will change due to outliers in most of the cases. Therefore, linear regression is sensitive to outliers.

Lecture 3 Flashcards

(25 cards)