general 2 Flashcards
regression is very sensitive to
rounding
dependent variable mean
line, the regression line
SSR=
SST-SSE
when SST is equal to SSE
when we don’t have an independent variable or we have just intercept amount.
Is related to the time that we estimate the line just with the average amount of dependent variable so
we have a horizontal line without slope.
SSE is related to
regression model, diffrence between obsereved value and predicted value
SST IS
DIFFERENCE between the observed value and the mean of the independent variable ( when we have a just horizontal line)
SSR is
the difference between the two lines, the horizontal line that is the mean of the dependent variable without an independent variable and regression line with the independent variable.
SSR is
the total residual of the regression line
r square =
SSR/ SST
degree of freedom in simple regression always is
two, because we just estimated the slope and intercept.
MSE=
SSE/DEGREE OF FREEDOM
MSE or mean some square is the
the variance of the error shows how spreads of data point around the regression line.
why the degree of freedom divided by n-2 and not divided just on n?
because we dealing with the sample, not the population, and divided on n give us the average.
the standard error is
the standard deviation of the error term is the average distance of observation that falls from the regression line in its units of the dependent variable.
the standard error formula is
the square root of the MSE
Priority
first: SSE second: MSE third: s or standard error.
when the beta equals zero then
the null hypothesis will be rejected.
adding more independent variables will lead to
overfitting and some problem so it is not always a good procedure. second part: multicollinearity is one of these problems that means independent variables are correlated together.
when two independent variables is correlated to each other
we cannot sure which of them explaining the variation in the dependent variable.
regression types
regression model and regression equation and third estimated multiple regression.
in multiple regression, each coefficient interpreted as
an estimate of the change in y variable corresponding to one unit change in one independent variable when all other variables remain constant
the primary purpose of log in regression:
is the way of scale skewed data, get the form of the normal distribution to skewed data,
price = 1079ln(x), interpret:
1 percent increase in x will to 1079/100 increase in price
ln price = 0.197ln(x), interpret:
1 percent increase in x will lead to a 0.197 percent increase in y. when to side have ln no dividing and multiple!
polynomial regression adds
extra independent variables that are the power of the original variable.
quadratic model:
it is squared of independent variable its allow our model to capture curvature. in original scatter plot
nonlinear quadratic model:
explained more variance, the tighter fit of observation around the regression line, reduce the model error.
Thus, the change in y is simply
b1 multiplied by the change in x. This means that b1 is the slope parameter in the relationship between y and x, holding the other factors in u fixed.
How can we hope to learn in general about the ceteris paribus effect of x on y, holding other factors fixed, when we are ignoring all those other factors?
As long as the intercept b0 is included in the equation, nothing is lost by assuming that the average value of u in the population is zero