Part 2: Regression Analysis Flashcards

1
Q

Regression analysis

A

The analysis of the statistical relationship among variables. In the simples form there are only 2 variables:

  • Dependent/response variable (Y)
  • Independent/predictor variable (X)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simple linear regression

A
Y = a + bX + e
a = intercept component to the model that represents the models value for Y for X=0.
b = specifically denotes the slope of the linear equation that specifies the model.
e = a term that represents the errors associated with the model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

R^2

A

The coefficient indicating goodness of fit (with max = 1). When R^2 increases, the fit of the model increases as well, but there is also more modelling noise. Proportion of variation in Y ‘explained’ by all X variables in the model.

  • R^2 = 0 when the second term is equal to 1. Which means that the estimated value is equal to the average.
  • R^2 = 1 when the second term is equal to 0. Which means that the estimated Y is always Y (y - ^y = 0). In this case the model does not have any error at all.
  • Can R^2 be negative? Yes, when you have very big errors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordinary least squares (OLS)

A

Method for finding the model with the best fit. It minimizes the errors associated with predicting the values for Y. It issues a least squares criterion because without square we would allow positive and negative deviations from the models to cancel each other out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is OLS often used for.?

A

Hedonic price models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Collinearity

A

Some independent variable that depend on another independent variable in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multiple regression model

A

Y = b0x0 + b1x1 + b2x2 + … + bNxN + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Neural network

A

Non-linear multiple regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Adjuster R^2

A

Compensates for the number of explanatory variables, penalty for extra variables. R^2 never decreases when a new X variables is added to the model, which may cause overfitting. To avoid overfitting you can use 2 sets, training set and validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model selection

A

2 ways:

  1. Forward selection: you start with one variable and keep on adding more variables until you have the prober R^2 without noise.
  2. Backward selection: start with a large set of variables and keep deleting variables that harm your model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly