Book 1_Quan_Simple Linear Regression Flashcards
Linear regression definition
provides an estimate of the linear relationship between an
independent variable (the explanatory variable) and a dependent variable (the
predicted variable)
The general form of a simple linear regression model
- Yi = bo + b1 Xi + ei
+ b1 = fitted slope coefficient = COVxy/stdx^2
+ b0 = fitted intercept = Y – b1 X
+ Dependent variable: Y
+ Independent variable: X
The estimated intercept, b0
represents the value of the dependent variable at the
point of intersection of the regression line and the axis of the dependent variable
(usually, the vertical axis)
The estimated slope coefficient, b1
is interpreted as the
change in the dependent variable for a one-unit change in the independent variable.
Assumptions made regarding simple linear regression include the following:
- A linear relationship exists between the dependent and the independent variable.
- The variance of the residual term is constant (homoskedasticity).
- The residual term is independently distributed (residuals are uncorrelated).
- The residual term is normally distributed.
Linear Relationship
A linear regression model is not appropriate when the underlying relationship
between X and Y is nonlinear
Homoskedasticity
refers to the case where prediction errors all have the same variance
Normality
When the residuals (prediction errors) are normally distributed, we can conduct hypothesis testing for evaluating the goodness of fit of the model
Outliers
observations (one or a few) that are far from our regression line (have
large prediction errors or X values that are far from the others)
Analysis of variance (ANOVA)
a statistical procedure for analyzing the total
variability of the dependent variable.
The total sum of squares (SST)
measures the total variation in the dependent
variable
SST = total (Yi – Ymean)^2
The mean square regression (MSR)
the SSR divided by the number of independent variables
(MSR) = RSS/k
The sum of squares regression (SSR)
measures the variation in the dependent variable that is explained by the independent variable
RSS = total (expected Yi – Ymean)^2
The sum of squared errors (SSE)
measures the unexplained variation in the
dependent variable
total (Yi – expected Yi)^2
The mean squared error (MSE)
is the SSE divided by the degrees of freedom,
which is n − 1 minus the number of independent variables
(MSE) = SSE/(n-k-1)