Chapter 7 - Intro to Linear Regression Flashcards
Simple Linear Regression
Explain the variation in a dependent variable in terms of the variation of a single independent variable
Variation in Y (linear regression)
sum of all (Yi - Ybar)^2 . Meaning, Variation in Y = the sum of all (variable Y - Avg Y)^2
Dependent variable
Y. Variation is explained by independent or X. Also known as explained, endogenous, predicted.
Independent Variable
Variable that explains the variation of Y, or dependent variable. Explanatory, exogenous, predicting.
Linear Regression Model: Yi=b0+b1Xi+Ei, i=1,…, n
Yi = ith observation of the dependent variable, Y. Xi = ith observation of the independent variable, X, b0 regression intercept, b1 = regression slope coefficient, ei = residual for the ith observation (disturbance or error)
b0
regression intercept term
b1
regression slope term
Ei
residual for the ith observation (disturbance / error).
^Y i = ^b0 +^b1Xi, i=1,2,3,..n
Linear Equation/Regression line
Regression line: ^Yi, ^b0, ^b1
Estimated value of Yi given Xi, estimated intercept term, estimated slope term.
Sum of Squared Errors SSE
Sum of squared vertical distances between actual Y values and predicted Y values regression line minimizes this
^b1
Slope coefficient; Change in Y for unit 1-change in X. COVofXY/σ^2ofx
^b0
=y ̅-^b1x ̅. Intercept - estimate of dependent variable, when independent (X) variable is zero.
Linear Regression Assumptions
- A linear relationship exists between the dependent and independent variables. 2. The variance of the residual term (e/error) is constant for all observations (homoskedasticity). 3. The residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation. 4. The residual term is normally distributed.
Homoskedasticity
Prediction errors all have same variance.
Heteroskedasticity
Assumption of homoskedasticity is violated.
ANOVA analysis of variance
analyzes the total variability of the dependent variable
Total sum of squares SST
∑_(i=1)^n (yi-y ̅ )^2 . Measures the total variation in the dependent variable. Equal to the sum of squared differences between actual Y value and mean of Y.
Sum of squares regression (SSR)
Measures variation in dependent variable explained by INDEPENDENT variable. Sum of squared differences between PREDICTED Y and mean of Y. Sum of (^Yi -y ̅ )^2.
Sum of squared errors (SSE)
Measures unexplained variation in dependent variable. SSE = SST- SSR. Sum of squared vertical distances between actual Y and PREDICTED Y. sum of (Yi-^Y)^2.
MSR (Mean sum of square, SSR)
SSR/1 = SSR. MSR is SSR.
MSE (Mean sum of square, SSE)
SSE/n-2.
SEE
square root of MSE. Standard deviation of residuals. Lower SEE, better model fit.
R^2
Coefficient of determination. % of total variation in dependent variable explained by independent variable. R^2 = SSR/SST.
F test
MSR/MSE. One tailed test. How well a set of independent variables, explain variation in dependent variable. Reject H0 if F>Fc.
Predicted Value ^Y
^Y = ^b0 +^b1Xp
Log - lin model
Dependent variable is log while independent is linear
Lin- log model
Dependent variable is linear while independent is logarithmic. Absolute change in dependent for relative change in independent.
Log-log
Dependent variable and independent variable are log. Relative change in both.