Part 2: Regression Analysis Flashcards
Regression analysis
The analysis of the statistical relationship among variables. In the simples form there are only 2 variables:
- Dependent/response variable (Y)
- Independent/predictor variable (X)
Simple linear regression
Y = a + bX + e a = intercept component to the model that represents the models value for Y for X=0. b = specifically denotes the slope of the linear equation that specifies the model. e = a term that represents the errors associated with the model.
R^2
The coefficient indicating goodness of fit (with max = 1). When R^2 increases, the fit of the model increases as well, but there is also more modelling noise. Proportion of variation in Y ‘explained’ by all X variables in the model.
- R^2 = 0 when the second term is equal to 1. Which means that the estimated value is equal to the average.
- R^2 = 1 when the second term is equal to 0. Which means that the estimated Y is always Y (y - ^y = 0). In this case the model does not have any error at all.
- Can R^2 be negative? Yes, when you have very big errors.
Ordinary least squares (OLS)
Method for finding the model with the best fit. It minimizes the errors associated with predicting the values for Y. It issues a least squares criterion because without square we would allow positive and negative deviations from the models to cancel each other out.
What is OLS often used for.?
Hedonic price models
Collinearity
Some independent variable that depend on another independent variable in the model.
Multiple regression model
Y = b0x0 + b1x1 + b2x2 + … + bNxN + e
Neural network
Non-linear multiple regression model
Adjuster R^2
Compensates for the number of explanatory variables, penalty for extra variables. R^2 never decreases when a new X variables is added to the model, which may cause overfitting. To avoid overfitting you can use 2 sets, training set and validation set.
Model selection
2 ways:
- Forward selection: you start with one variable and keep on adding more variables until you have the prober R^2 without noise.
- Backward selection: start with a large set of variables and keep deleting variables that harm your model.