Book 1_Quan_Simple Linear Regression Flashcards
Linear regression definition
provides an estimate of the linear relationship between an
independent variable (the explanatory variable) and a dependent variable (the
predicted variable)
The general form of a simple linear regression model
- Yi = bo + b1 Xi + ei
+ b1 = fitted slope coefficient = COVxy/stdx^2
+ b0 = fitted intercept = Y – b1 X
+ Dependent variable: Y
+ Independent variable: X
The estimated intercept, b0
represents the value of the dependent variable at the
point of intersection of the regression line and the axis of the dependent variable
(usually, the vertical axis)
The estimated slope coefficient, b1
is interpreted as the
change in the dependent variable for a one-unit change in the independent variable.
Assumptions made regarding simple linear regression include the following:
- A linear relationship exists between the dependent and the independent variable.
- The variance of the residual term is constant (homoskedasticity).
- The residual term is independently distributed (residuals are uncorrelated).
- The residual term is normally distributed.
Linear Relationship
A linear regression model is not appropriate when the underlying relationship
between X and Y is nonlinear
Homoskedasticity
refers to the case where prediction errors all have the same variance
Normality
When the residuals (prediction errors) are normally distributed, we can conduct hypothesis testing for evaluating the goodness of fit of the model
Outliers
observations (one or a few) that are far from our regression line (have
large prediction errors or X values that are far from the others)
Analysis of variance (ANOVA)
a statistical procedure for analyzing the total
variability of the dependent variable.
The total sum of squares (SST)
measures the total variation in the dependent
variable
SST = total (Yi – Ymean)^2
The mean square regression (MSR)
the SSR divided by the number of independent variables
(MSR) = RSS/k
The sum of squares regression (SSR)
measures the variation in the dependent variable that is explained by the independent variable
RSS = total (expected Yi – Ymean)^2
The sum of squared errors (SSE)
measures the unexplained variation in the
dependent variable
total (Yi – expected Yi)^2
The mean squared error (MSE)
is the SSE divided by the degrees of freedom,
which is n − 1 minus the number of independent variables
(MSE) = SSE/(n-k-1)
Total variance formula
total variation = explained variation + unexplained variation
or:
SST = SSR + SSE
Standard Error of Estimate (SEE)
is the standard deviation of its residuals. The lower the SEE, the better the model fit:
(SEE) = căn (MSE)
Measure the degree of variability of the actual Y values relative to the estimated Y values from a regression equation
Coefficient of Determination (R2)
The percentage of the variation of the dependent variable that is explained by the independent variable
= SSR/SST => Càng lớn thì đường regression càng đúng
An F-test meaning
- assesses how well a set of independent variables, as a group, explains the
variation in the dependent variable
or - evaluate whether independent variable explain the variance of dependent variable
o Ho: b1 = 0; Ha: b1 # 0
o One-tailed test
o F = MSR/MSE
o Critical value: depend on the level of significant and and 2 degrees of freedom: k and n – k – 1
o Reject Ho nếu F-statistic > Critical value
Hypothesis Test of a Regression Coefficient
o Slope coefficient (b1^): Tb1 = (b1^ - b1)/Sb1
o Pair-wise correlation (p)
o Intercept (bo^)
o Two-tailed statistic
Confidence Intervals for Predicted Values
o Y^ +- (tc x st)
o tc = two-tailed critical t-value at the desired level of significance with df = n-2
o sf = standard error of the forecast
Log-lin model
This is if the dependent variable is logarithmic, while the independent variable is linear.
Lin-log model
This is if the dependent variable is linear, while the independent
variable is logarithmic
Log-log model
Both the dependent variable and the independent variable are
logarithmic.