Final Exam Review Flashcards
steps to calculating a basic regression equation
- calculate E(x) and E(y)
- calculate actual minus expected for each variable, each observation
- square “actual minus expected” for only X
- for y, take (actual(x) - expected(x))*(actual(y)-expected(y))
- take the sums of 3 and 4
- take ratio of these sums with XY on top
this is your estimator
intercept = E(y) -E(x)*slope
interpret:
selling price = 150 + .027(square footage)
for every increase of 1 square foot, we can expect to see the selling price of the house increase by .027
Total sum of squares
total actual variation in Y
sum of (Actual Y - expected Y)^2
Explained sum of squares
the modeled variation in Y
sum of (estimated y - expected Y)^2
steps for an R^2
- take the actual values of Y, get E(y)
- take (ACTy - EXPy), square it, add it up
- thats the actual variation in Y
- use the results of regression to calculate predicted values of Y for every X
- take (PREDICTEDy-EXPECTEDy); square it; add it up
- take the ratio of 5 over 3 (ESS/TSS)
r^2
how much of the variance in our data is explained by the regression model
Standard Error of OBservations
the standard deviation of the error term
-how far are the actual values from our fitted line
What does a t-stat of 1.96 actually mean?
this means you could create an interval of a certain width (1.96 SE above and below our hypothesized mean) and if you repeated your sampling 100 times, 95/100 times that interval would contain the true population mean
standard error of the beta hat
remember beta is a random variable
- therefore, it has an expected value and distribution
- that distribution, the sampling distribution, has a spread, measured by the variance and the SD
- the standard error of beta hat is the SD of the sampling distribution…it measures the spread
heteroskedastic
variance of the error term changes
homoskedastic
variance of the error term is constant
why does a smaller standard error of the beta hat usually correlate with a larger t stat?
because as SE decreases, the width of the distribution gets smaller and smaller…decreasing our chances of having an insignificant value…spread is TIGHT
F-statistic
-used to test a joint hypothesis that ALL of our betas are really zero
formula for an f-statistic
(1-R^2)/(n-k-1)
k is # of IVs
whats the formula for testing a subset of variables?
(RSSrestricted-RSSunrestricted)/q
______________________________
RSSunrestricted/n-k-1
after running a simple p and q demanded regression, how do we find the elasticity of a product at a certain point?
ex: q = 1147 - 7.93*p, p = 70
(p/q)*coefficient = elasticity at that point
70*-7.93 = 555
1147-555=592
-7.93*(70/592)
what’s the most basic way to estimate a curvilinear regression?
convert the data to natural logs, dummy
how do we convert an equation BACK from a log?
- take the exponential of your intercept term
- take your coefficient (elasticity) and raise your P to it
- multiply these things
ex: Q demanded = 23,156*p^(-.873)
log-log (def and interpretation)
- take the log of both the variables
- changes interpretation to percentages
ex: a certain % change in X is associated with a certain % change in Y
log-linear (def and interpretations)
- convert the DV to logs
- leave IVs just as they are
- changes to the DV are interpreted as percentages, NOT units
interpret: Ln(selling price) = 6.21 - .08*(age)
log linear, so:
a one year increase in age of the house translates to an 8% increase in the selling price of the house
can you compare the R^2 of a linear regression and a log linear regression?
NO, the DV’s are two very different thigns
linear-log (def and interpretation)
convert the IVs to logs
-changes in the IV are interpreted as percentages, not units
interpret: # of dining out experiences = -37.9 + 11.33*(LnIncome)
a 1% increase in income translates to a .11 increase in dining out experiences per month
-OR a 9% increase in income translates to a 1 time increase in dining out per month
panel data
combination of cross sectional AND time series
-data varies across entities and across time
fixed effects regression (time and entity)
- holding constant the effects of different locations or places or times
- done using thing specific binary variables and dummy coding
how to test for significant of time fixed effect and entity fixed effect
(RSSrestricted-RSSunrestricted)/q
______________________________
RSSunrestricted/n-k-1
how do we write in fixed effects on STATA?
xi: regress y x1 x2 x3 i.x4 i.x5
- stata will then create dummy variables for all but one of the state or time categories
if you ran a regression with a binary variable and your coefficient is -.00377, how do you interpret this?
in states and years where the legal drinking age was 21, the data shows that seat belt usage was, on average, .37% lower than in states and years where the legal drinking age was not 21
probit
in a probit model, the model plays the role of z in a cumulative standard normal distribution
-estimated beta is the change in the z value associated with a unit change in X
it spits out the chance that something is or isn’t occurring
how does a pro bit work?
when state runs a pro bit, it is choosing values for the parameters that maximize a function
-this function is called a likelihood function
the likelihood we speak of is the likelihood that we observe the given Y values basked on our given X values
time series
used for forecasting, data varies across time periods…not markets or person
static or contemporaneous time series variables
an IV that is a statistic from the same time…such as Real GDP and Unemployment for each respective year
dynamic time series variable
a statistic from the previous year…ex: REAL GDP and GDP from previous year
path order Autoregressive Model
uses Y(T-1) to forecast Yt
- there is no rule that says you can only use one lag
- you can run a regression where IVs are Yt-1, Yt-2, Tt-3…so on
how do you decide on how many lags to include?
Bayes Information Criterion Test
-need RSS, total number of time periods, number of lags
BIC method
- take ln(RSS/periods)
- take number of (lags + 1)*(Ln(periods)/periods)
- add both numbers of that
- the smallest BIC value is the one you use!
why do we want BIC to be the smallest?
- the BIC has two parts, one part is a function of RSS
- we know this gets smaller as you add more variables
- the second part is a function of lags
- if you add a lag, the second part gets bigger and IF additional lag doesn’t do much to reduce RSS, the RSS only gets a little smaller…so overall BIC gets bigger
- but if you add a lag and it does a lot to make RSS smaller, then the first part gets a lot smaller, the second part gets a little bigger but the overall bic went down
- lower is better; the benefit you get from adding another lag outweighs the addition of the lag
autoregressive distributed lag model
adding an additional variable
- ex: past changes in GDP PLUS, say, unemployment
- written like: ADL(p,q)
BIC for an ADL(p,q) model
ln(RSS(k))/T) + (k)*(Ln(t)/t)
where k is the total number of coefficients across all RHS variables
granger causality
- a test to see whether an entire series of a RHS variable has useful predictive content
- tests whether the estimated coefficients are all distinguishable from zero
granger causality formula
RSSunrestricted/(n-k-1)
how to interpret a LOGIT
e^(result)/(1+e^result)
result is a z-score that you can use to find out the likelihood of something occurring
what are probits and logits interpreted as?
model is interpreted as probabilities that something occurred!
- used for prediction/identification: based on certain factors, what is the probability that a person/observation falls into this group or that
ex: the higher a person’s income, the greater likelihood he/she owns a car and drives themselves to work
what does your beta coefficient represent in a pro bit regression?
- estimated beta is the change in the z-value associated with a unit change in x
- -if b is positive, a higher level of X is associated with a higher level of the z-value, therefore associated with a higher prob that y = 1
- -if b is negative, a higher level of X is associated with a lower level of the z-value, and associated with a lower probablitliy that y = 1
does STATA use a linear model when coming up with a regression equation for the data?
NO, it uses a maximum likelihood estimation
-likelihood that we observe the given Y values baed on our given X values
the impact of a change in x on the z score is linear, but what about the impact on the probability?
IT IS NOT LINEAR.
-marginal effects will tell us the average, but this is misleading
why is marginal effects of a probit misleading?
the average change in probability if a variable changed by x units is your MFX…however, this is not representative of the whole group
-the increase in probability is drastically effected by the spotting of the change
what two different functions do a pro bit and a logit use?
probit: standard normal cumulative density function
logit: logistic cumulative density function
do probits and log its have R^2?
they have pseudo R^2
-no sums of squares, can’t have R^2