Lecture 3 Flashcards

1
Q

Regression

A

Using supervised learning to predict continuous outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cost Function

A

Tells us how well our model approximates the training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A Model

A

A function that represents the relationship between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Optimization

A

A way of finding parameters for the model while minimizing the loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Linear Regression

A

aka ordinary least squares;

Given input feature x we predict output y;

Follows basic y = mx + e equation with m as the parameter and e the measurement or other noise;

supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why use Least Squares Linear Regression?

A

It minimizes the squared distance between measurements and regression line and is easy to compute (even by hand)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do you do when your least squares minimization line doesn’t pass through the origin?

A

Introduce a bias term (intercept labeled as ‘b’).

This gets you y = mx+b+e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Curve Fitting

A

Finding a mathematical function (constructing a curve) that has the best fit on a series of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Smoothing

A

When you don’t look for an exact fit but for a curve that fits the data approximately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Perfect Fit

A

Fits the data precisely; goes through all data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best Fit

A

May not be the perfect fit; should give the best predictive value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fit

A

The accuracy of a predictive model;

the extent to which predicted values of a target variable are close to the observed values of that variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

R^2

A

How the fit is expressed for regression models;

The percentage of variance explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Underfitting Performance

A

Performs badly on both training and validation sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Overfitting Performance

A

Performs better on the training set than on the validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you tune hyperparameters?

A
  1. Divide training examples into training and validation sets.
  2. Use the training set to estimate the coefficients m for different types of hyperparameters (degree of the polynomial)
  3. The validation set estimates the best degree of the polynomial by evaluating how well the classifier does on this validation set
  4. Test how well it generalizes on unseen data
17
Q

Regularization

A

The reduction of the magnitude of the coefficients;

Helps to prevent overfitting by limiting the variation of the parameters which prevents extreme fits to the training data

18
Q

Ridge Regression

A
  • Reduces model complexity by coefficient shrinkage.
  • The penalty term is controlled by alpha
  • The higher the value of alpha, the bigger the penalty. Therefore, the magnitude of coefficients are reduced
19
Q

Least Absolute Shrinkage Selector Operator (Lasso) Regression

A

The magnitude of the coefficients reduce at even small values of alpha;

Lasso reduces some of the coefficients to zero (known as feature selection) which is absent in ridge regression

20
Q

Feature Selection

A

Reducing some of the coefficients to zero.

Present in Lasso but not Ridge regression

21
Q

Gradient Descent

A

An iterative optimization algorithm for finding the local minimum of a function. To find the local minimum of a function using gradient descent, we must take steps proportional to the negative of the gradient (move away from the gradient) of the function at the current point.

22
Q

The coefficients of the least squares regression line are determined by minimizing the sum of the squares of ____?

A

residuals (y-coordinates - slope*x-coordinates)

23
Q

What do the outputs of a linear regression look like if we have 1, 2, or more than two inputs?

A

The output of a one-dimensional input is a line (y = w1x1+ b).
The output of a two-dimensional input is a plane (y = w1x1 + w2x2. + b).
The output of multi-dimensional inputs (more than two) is a hyperplane

24
Q

How is the residual variance computed?

A

The residual variance is computed as the sum of the squares of the y‐coordinates from the data minus the y‐coordinates predicted by linear regression.

See slide 15 of the lecture notes

25
Q

Is linear regression sensitive to outliers?

A

Yes, the slope of the regression line will change due to outliers in most of the cases. Therefore, linear regression is sensitive to outliers.