7.1. Regression Analysis I Flashcards
Regression analysis…
Can be used to examine the linear relationship between two or more variables.
A direction of causality is asserted (this assertion must be backed by research / we are assuming the causal relationship obtained from theoretical underpinning).
The direction is from the explanatory variable to the dependent variable.
The influence of each explanatory variable on the dependent variable is calculated and measured and we can test if this significance is greater than zero.
We measure Y and change X.
The regression equation…
Has two parts.
- Explained component, a + bXi.
- Unexplained, ei (the error term).
The line of best fit is found by calculating the intercept and slope, which minimise the unexplained error term.
Goodness of fit (regression)…
Regression measures the goodness of fit.
If there is a slope, that shows there is a causal relationship.
We can always calculate the slope value between any data but it is important to consider how far each observation is away from the regression line.
We can calculate the goodness of fit by comparing two lines:
- The regression line.
- The mean line: a horizontal line drawn at the mean value of Y.
- The coefficient of determination (R2).
Interpreting the slope and intercept…
The slope coefficient: shows that a unit increase in X changes Y by a specific percentage.
The intercept: this is the predicted value of Y when X equals zero.
The regression line therefore equals Y = b - aXi.
Coefficient of determination…
R^2.
The difference between Yi and Y bar.
A part can be explained by the regression line (Y hat - Y bar).
A part can be explained by the error term (Yi - Y hat).
Total variability = variability explained by the model + unexplained variability.
R^2 interpretation…
0 < R^2 < 1:
R^2 = 1: all the observations lie exactly on the regression line.
R^2 = 0: the regression line is no better than the mean.
The error term…
ei ~ N(0, variance).
The error has a normal distribution.
The error is a random variable with mean zero. A and B are constant for a given X value.
The variance of the error is denoted by variance symbol, and is the same for all values of X.
The values of the error are independent. A and B are unbiased estimates of the population parameters. When autocorrelation is present, the error terms are not independent.
Statistical inference…
Confidence intervals can be used.
Uses a two-tailed critical value from t-distribution, with respective confidence level.
The degrees of freedom are dependent on (n - number of explanatory variables - 1).
To calculate the confidence interval, first calculate the standard error, then the critical value and use the original value.
Testing out regression…
After this round, complete the example on Google Docs 7.1. Regression Analysis Part I.