Bi-variate Analysis Flashcards
Bi-variate relationship
Evaluate relationship between 2 variables
Line of best fit
Distance between line and scatterplot points should be as small as possible
^y = ^B(0)+^B(1)x+U
expected value of y equals the estimated slope time x plus the estimated y-intercept, plus the error (u)
Residuals
- difference between actual and estimated values
- minimizing sum of squared residuals suggests actual and estimated are close
U
omitted variables that impact y
Regression
At the end of the day this is an estimate / prediction
B(0)
estimated y-intercept, predicted value of y when x = 0
B(1)
estimated slope, predicted change in y, when x changes by 1 unit
Correlation Coefficient
measures how variances move / tightness of relationship
covariance / variance(x)*variance(y)
always between -1 and 1
Derivative
taking derivative then setting formula equal to 0 will minimize sum of least squares
Percentage Point
- A % change relative to what you had before
- of a share
covariance(x,u)
- Measure of if x is correlated with any omitted variables that influence the regression
- cov(x,u) must = 0 or regression is biased
- biased because we can’t establish casual effect, if y has a relationship w another variable
R-squared
AKA coefficient of determination, measures if line through a scatter is a good fit
The percent of variation in y that is explained by the model - what % of the variation in y are we getting
The higher the better, but only valuable if we want to precisely predict y
No effect with a higher sample size, but adding new regressors (k) will increase r-squared
Ceterus Paribus
-Rule out the possibility of other factors changing casual relationship, by holding other factors affecting dependent variable constant
E[^B(1)] formula
- β1 + covariance(x,u) / variance(x)
- bias will depend on whether or not cov(x,u) will be positive or not
- if it is positive and ignored, then B1 is most likely biased down and vice-versa
- for the expected value of B1-hat to equal B1, the cov(x,u) must be 0