1.7 linear regression Flashcards
the dependent variable or the explained variable
refers to the variable whose variation is being explained. It is typically denoted by Y
the independent variable or the explanatory variable
refers to the variable whose variation is being used to explain the variation of the dependent variable
Denoted by X
If there is only one independent variable, then the regression is known as
Simple Linear Regression
If there are two or more independent variables, the regression is known as
multiple rergression
The four assumptions underlying the simple linear regression model are:
- Linearity
- Homoskedasticity
- Independence
- Normality
Linearity
The relationship between the dependent variable and the independent variable is linear
Homoskedasticity
The variance of the residuals is constant for all observations
Independence
The pairs (X, Y) are independent of each other. This implies the residuals are uncorrelated across observations
Before running a regression model, an analyst states two of the underlying assumptions for linear regression analysis:
Assumption 1: The variance of the error term has an expected value of zero
Assumption 2: The independent variable is not random
What is the most accurate assessment of the analyst’s description of the assumptions that must be satisfied to draw valid conclusions from a simple linear regression model?
A
Assumption 1 and Assumption 2 are both correct
B
Assumption 1 is correct and Assumption 2 is incorrect
C
Assumption 1 is incorrect and Assumption 2 is correct
C
Assumption 1 is incorrect and Assumption 2 is correct
The sum of squares total (SST), which is the total variation in Y, can be broken into two components:
- Sum of squares error (SSE), which is the unexplained variation in Y
- Sum of squares regression (SSR), which is the explained variation in Y
Measures of Goodness of Fit
Measures to evaluate how well the regression model fits the data include:
- The coefficient of determination
- The F-statistic for the test of fit
- The standard error of the regression
The coefficient of determination (a.k.a. R-squared or R^2)
measures the fraction of the total variation in the dependent variable that is explained by the independent variable
F-Statistic
To evaluate if our regression model is statistically meaningful
The objective of linear regression
to understand what explains the variation of Y, also known as the sum of squares total (SST), or the total sum of squares
how to find the variation of Y, also known as the sum of squares total (SST)
E(Yi - Y_)^2
how to find the variation of X
E(Xi - X_)^2
The regression equation
expresses the linear relationship between X and Y
Y = b0 + b1 * X + e
b0 = the intercept
b1 = the slope coefficient of the regression line.
e = the error them, which represents the difference between the observed value of Y and its expected value from the true underlying population.
–> the difference between the expected value of Y and the value of Y in the underlying population
The line plotted by the regression equation represents
he average relationship between the dependent variable and the independent variable
The difference between the observed and estimated values of the dependent variable
the residual
Observed value of Y - estimated value of Y
coefficient of determination (R^2) formula
SSR / SST
In a simple linear regression with only one independent variable, the coefficient of determination is equal to
the square of the correlation between X and Y.
R^2 is a descriptive measure, not a statistical test.
the F test statistic for a simple linear regression model is:
F = (SSR/1) / (SSE(n-2))
The standard error of the estimate (se)
the square root of MSE
se = sqrt(MSE)
also known as the standard error of the regression or the root mean square error.
A smaller value indicates a more accurate regression.
The estimated variance of a regression model’s prediction error is determined by several factors:
The squared standard error of the estimate, S^2e
The number of observations, n
The value of the independent variable X relative to X_
The variance of the independent variable, S^2x