Quantitative Methods Flashcards
error term (residual)
The portion of the dependent variable that can’t be explained by the independent variable
dependent variable
Y - the variable we’re seeking to explain
independent variable
X - the explanatory variable
Cross-sectional
many observations on X & Y for the same time period
Time series
many observations on Y (and sometimes X) from different time periods
Assumptions underlying linear regression
- The relationship between X & Yis linear in the parameters b0 and b1 (meaning both are raised to the 1st power only and neither is multiplied/divided by another regression parameter)
- X is not necessarily random
- Expected value of error terms = 0
- Variance of error terms is the same for all observations
- Error term is uncorrelated across observations (in other words, no serial correlation) -> this is needed to correctly estimate the variances of b0 and b1
- Error term is normally distributed
Assumptions 2/3 -> ensure that correct estimates of b0 and b1 are produced
Assumption 4/5/6 -> determine the correct distribution of b0 & b1 so we can test the values of the coefficients
Standard error of estimate
Measures the standard deviation of error term
SEE = (SSE/n-2)^0.5 or (MSE)^0.5
Coefficient of determination
Measures the faction of the total variation in the dependent variable that’s explained by the independent variable
R^2 = EXPLAINED VARIATION/TOTAL VARIATION
Confidence interval for a regression coefficient
An interval of values that is believed to incl. the true parameter value of b1 w/a given degree of confidence
b1 +/- (critical t value) * (standard error of estimate of b1)
Hypothesis testing
t = (^b1-b1)/(std error of estimated regression coefficient)
if |t test statistic| > |critical t value| -> reject H0 -> conclude statistical significance
Usually H0: regression coefficient = 0
P-value
Smallest level of significance at which H0 can be rejected (2-sided test)
If p-value < significance level -> reject H0 -> conclude statistical significance
ANOVA
ANalysis Of VAriance - determine the usefulness of the independent variables in explaining the variance in the dependent variable
SSE = sum of squared errors (unexplained) RSS = regression sum of squares (explained) -> total variation in Y that's explained by the regression equation
TSS = SSE + RSS
Limitations of regression analysis
- Regression relations can change over time (Parameter instability)
- Public knowledge may negate usefulness of analysis
- Output depending on regression assumptions -> Tests can be performed on error terms
Multiple regression
Difference from linear regression is that now you have more than 1 independent variable
e. g. Y = -23 + 0.3X1 - 0.225X2
0. 3 represents the expected effect on Y of a 1-unit increase X1 after removing the part of X1 that is correlated w/X2
If X1 and X2 are uncorrelated, then a regression w/just X1 would have the same coefficient