Week 2 - Intro to Univariate Regression Flashcards
What is Covariance
The covariance between two variables, x and y is deÖned as the
average of the product between the distance of each variable with
respect to its mean.
What is positive or negative covariance
The values of the covariance can be positive or negative. It depends
on which quadrant most of the observations are located:
If most of the observations are located in the top left and bottom right
quadrants =) Negative covariance, variables tend to decrease together
If most of the observations are located in the bottom left and top right
quadrants =) Positive covariance, variables tend to increase together
No Pearson’s Correlation Coefficient
The limitations of the covariance in terms of describing a bivariate
relationship are overcome by a different statistics: the Pearsonís
correlation coefficient .
The correlation coefficient summarises the strength and the
direction of the relationship between two continuous variables using a
range of values that are comparable across samples.
what does the correlation coefficient indicate
The values of the correlation coefficient range from -1 to 1
direction: More than 0, relationship is positive, less than 0 the relationship is negative
Strength: if = 1 or -1, the relationship is perfectly linear, closer to 0 it has a weak relationship
Give the generic regression equation
y = β0 + β1x + u
where:
1 y is the dependent variable
2 x is the independent variable
3 u is the error term
What are we assuming about u, and what does this imply
E(u) = 0
E(u j x) = E(u) = 0
E(y j x) = β0 + β1x
Explain residuals
Sum of Residuals Equals Zero: In Ordinary Least Squares (OLS) regression, the residuals sum to zero:
This occurs because OLS minimizes the sum of squared residuals, ensuring the regression line passes through the “center” of the data.
Orthogonality to Predictors: The residuals are uncorrelated with the predictors
This ensures the best linear fit to the data.
Estimator of the Error Term: Residuals approximate the true error term
u, but they are not identical. Residuals are sample-specific and depend on the estimated regression coefficients, while
u is the unobservable error for the population.
how do you estimate the intercept
βˆ0 = y¯ - βˆ1X¯
The formula for this ensures that the regression line passes through the “center” of the data, represented by the means
This is a consequence of minimizing the sum of squared residuals in OLS regression
how do you estimate OLS
Cov/Var
What are some properties of OLS estimators
1 The sum of the OLS residuals is 0
2 The sample mean of the OLS residuals is also 0
3 The covariance between x and the OLS residuals is also, 0
4 The OLS regression line always crosses the point (x¯, y¯). It always
passes by the sample mean values.
Explain r squared
The correlation, r, is the basis to calculate the R-squared indicator.
This value indicates the fraction of the variation in the values of y
that is explained by the OLS regression of y on x.
r squared = variation of estimated y / total variation of observed y
Closer r squared is to 1, the more OLS regression explains all the variation of y