Regression and Multiple Regression Flashcards
Regression
Regression estimates the relationship between one variable and a number of others. It is used for both description and prescription
Steps
Scatterplots and Correlation analysis: feel for the direction and strength of the relationship
Model estimation: software
Diagnostic evaluation: evaluate validity and usefulness
Simple Linear Regression
Tests relationships between 2 variables in a linear model
y = a + b x
Residual Scatter
Residual (e): vertical distance of a point from a line. The difference between observed and predicted values of dependent variable.
Choose a line so that residual scatter is minimised
R-Squared
In simple terms, R-squared tells you how well the independent variables in your model explain the variation in the dependent variable. However, it doesn’t tell you whether the coefficient estimates and predictions are biased, or whether the model is a good fit for the underlying data.
=1-s/sy
is a measure of how well the line fits the data
Standard error
The adjusted R squared penalises for low number of observations and high number of explanatory variables
S: the residual sum of squares (the sum of squared differences between the actual outcomes and the predicted outcomes)
Sy: represents the total sum of squares (the sum of squared differences between the actual outcomes and their mean).
Residuals
residuals should be simple randomness remaining after deterministic part of variation in y has been modelled.
A non-random pattern would indicate there is something the model is not capturing that is leaking into the residuals.
Testing Significance of variables
|t-stat| > 2 (1.96), reject the null hypothesis
t-stat = coefficient estimate/st.error
(p-val):
The probability of finding the observed relationship if the null hypothesis was true
Good model
All coefficients: t-stat > 2, p-val<0.05
High adjusted R2
Satisfactory residuals
Equation makes sense
Multiple Regression
a is the baseline
b is the increase in y resulting for a unit rise in x
Multicollinearity: A high degree of correlation between 2 or more explanatory variables
Cost is highly correlated with 1/Capacity and Years but these two are highly correlated with each other.
Including Years adds little information, hence model fit is no better