Bi-variate Analysis Flashcards
Bi-variate relationship
Evaluate relationship between 2 variables
Line of best fit
Distance between line and scatterplot points should be as small as possible
^y = ^B(0)+^B(1)x+U
expected value of y equals the estimated slope time x plus the estimated y-intercept, plus the error (u)
Residuals
- difference between actual and estimated values
- minimizing sum of squared residuals suggests actual and estimated are close
U
omitted variables that impact y
Regression
At the end of the day this is an estimate / prediction
B(0)
estimated y-intercept, predicted value of y when x = 0
B(1)
estimated slope, predicted change in y, when x changes by 1 unit
Correlation Coefficient
measures how variances move / tightness of relationship
covariance / variance(x)*variance(y)
always between -1 and 1
Derivative
taking derivative then setting formula equal to 0 will minimize sum of least squares
Percentage Point
- A % change relative to what you had before
- of a share
covariance(x,u)
- Measure of if x is correlated with any omitted variables that influence the regression
- cov(x,u) must = 0 or regression is biased
- biased because we can’t establish casual effect, if y has a relationship w another variable
R-squared
AKA coefficient of determination, measures if line through a scatter is a good fit
The percent of variation in y that is explained by the model - what % of the variation in y are we getting
The higher the better, but only valuable if we want to precisely predict y
No effect with a higher sample size, but adding new regressors (k) will increase r-squared
Ceterus Paribus
-Rule out the possibility of other factors changing casual relationship, by holding other factors affecting dependent variable constant
E[^B(1)] formula
- β1 + covariance(x,u) / variance(x)
- bias will depend on whether or not cov(x,u) will be positive or not
- if it is positive and ignored, then B1 is most likely biased down and vice-versa
- for the expected value of B1-hat to equal B1, the cov(x,u) must be 0
4 Conditions for OLS to Yield Unbiased Estimator
- random sample
- y and x relationship must be linear in parameters
- variation in x
- cov(x,u) = 0
These four conditions being met, means that E(B1^) = B1
P-value Approach
Alternative to finding t-critical
Probability of obtaining ^B(1) or a more extreme slope estimate
P-value final answer
p-value > sig-level = |t-actual| < t-critical
p-value < sig-level = |t-actual|>t-critical
Statistical Significance
Rejecting null hypothesis that slope = 0 means x has some level of non-zero effect on y
cov(x,u) = 0
NO correlation between x and all the other variables that affect y
Interpreting slope / y-int
Don’t forget this are predicted / estimated values and impacts
Omitted Variables Bias (^B1)
if cox(x,u) > 0 - biased down because you are ignoring a positive correlation, and the estimated value of ^B1 is bigger b/c we are ignoring the positive correlation
if cov(x,u) < 0 - biased up because ignoring the negative correlation would be the expected value of ^B1 is likely smaller than what we thought
Linear in Parameters
In order for us to run a regression the relationship between two variables, has to be a straight-line
Variation in X
To tell the relationship between x and y, you need different x-values, no vertical line
No variation in x would make var(x) = 0
Homoscedastic Assumption
- v.var(u|x) = sigma-squared, if this condition of constant variance holds, we can estimate variance on ^B1
- needs to hold to get BLUE
- spread is the same as x-changes
v.var(u|x)
variance of u conditional on x
Multivariate Bias Conditions
- random sample
- linear in-parameters
- no perfect correlation between variables
- cov(x,u) = 0
perfect correlation only occurs if you include the same variable 2x or if you have two variables that sum to another variable.