Exam 2 Cumulative Review Flashcards
dependent variable
-the variable that is dependent on the independent variable, what is being measured
-y-variable
-left hand/left side variable
regressand
independent variable
-the variable that affects the dependent variable
-right hand variable
right side variable
-regressor
7 steps for calculating predictor x in regression equation
- calculate E(X) and E(Y)
- calculate “actual minus expected” for each variable, for each observation
- square “actual minus expected” JUST FOR X
- DO NOT square “actual minus expected” for Y
- instead, take (actual - expected of X) x (actually - expected for Y)
- add up what you got for 3 and 5
- take the ratio of the two (with XY on the top)
calculating intercept for regression equation
take the slope you calculating…
- multiply it by your E(x)
- subract that from your E(y)
how to interpret regression equation (use following example):
Y = 1.28 + .10443X
when the value of our X variable increases by 1, the value of our Y variable increases by .104
for every $10 increase in hourly income, the number of meals eaten out in a month increases by about 1
residual of a regression
the difference between the predicted Y and the actual Y according to the data
total sum of squares
the actual variation in Y (dependent variable)
explained sum of squares
the modeled variation in Y (dependent variable)
R^2
ratio of ESS/TSS
-used to tell how much of variance an IV explains in the DV
7 steps to calculating R^2
- take the actual values of Y, get E(y)
- take (act - exp)Y; square it, add it up
- that’s the actual variation in Y
- you have already run the regression
- use the results to calculated predicted values of Y for every X
- take (Predicted - Expected)Y; square it; add it up
- take the ratio of #6 over #3
classic linear regression model assumptions
- the model is linear in its parameters and has an additive error term
- the values for the IV’s are derived from a random sample of thepopulation and contain variability
- no IV is a perfect linear function of any other IV (no perfect colinearity)
- the error term has expected value of zero
- the error term has a constant variance (homoskedasity)
if the error term has an expected value of zero and also has a constant variance…we can conclude what about the error term? and then what about the estimate for b-hat?
- error term has a normal distribution
- our estimate B-hat is a linear function of the error term
- therefore, B-hat is normally distributed
- this means we can test our individual estimators for significance
equation for testing (tstat) individual estimator’s significance (assuming homoskedasticity)
(Betahat - 0) / standard error of Betahat
it is minus zero because that is our expected value for beta-hat
what does having a t-stat of 1.96 exactly mean for our beta-hats?
this means I could create an interval of a certain width (1.96 standard errors above and below our hypothesized mean) and if I repeated by sampling 100 times, 95 out of the 100 times that interval would contain the true population mean
-so if the interval for betahats on the printout from SAS contained zero, we cannot reject the null (beta = 0)
if our r^2 and SER isn’t great, what else can we check to find significance in our regression model?
we can check the t-stats for each individual beta-hat…we can’t conclude they are key drivers, but it does indicate and real and positive relationship
what is the standard error of beta hat?
-it is the standard deviation of the sampling distribution…it measures the spread
remember that beta hat is a random variable, therefore it has an expected value and a distribution
-that distribution has a spread, measured by the variance and the SD
homoskedastic
variance of the error term is constant
heteroskedastic
variance of the error term changes
the smaller the standard error of our beta hat, the larger our t-stat. AND the larger our t-stat, the more likely we will reject the hypothesis that b = 0…why?
when our standard error gets small, this means the distribution is getting less and less wide
- if our distribution of beta hat is very tight, our distribution is not very spread out
- this means there’s less and less of a chance that values will vary much from our expected value…which means there is less of a chance that zero will be in that interval
f-statistic
used to test a joint hypothesis
- the hypothesis that ALL of our betas are really zero
- B1=0 AND B2=0, not B1=0 OR B2=0
formula for f-stat
(R^2 / k ) /
((1-R^2) / (n-k-1))
k is number of IV’s
n is number of observations
formula for finding t-stat of restricted and unrestricted variables
((RSSrestricted - RSSunrestricted)/ q ) /
(RSSunrestricted) / n-k-1 )
explained sum of squares (regression sum of squares)
sum of squares of the deviations of the predicted values from the mean value of a response variable, in a standard regression model
Standard error of observations
the Standard deviation of the error
-large values imply that actual values are a long away from our fitted line
p-value
the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true
binary variable
either/or situation to represent discrete variables
-in STATA, you identify one outcome as 1 and the other as 0