L3-4 Linear Regression Flashcards
What is the sign for sample correlation
r(xy)
What is the sign for sample covariance
s(xy)
what is the sign for sample standard deviation
s
Linear regression allows us to estimate, and make
inferences about, population slope coefficients.
true
What is X a sign for in regression
The independent/explanatory variable also called the regressor. It is used to explain a change in the dependant Y variable
What is Y a sign for in regression
Dependent/target variable
What is B0 a sign for in regression
The intercept aka what is added to the linear regression line at zero
What is B1 a sign for in regression
The slope aka the amount to multiply the independent variable by for each step to arrive at the dependent variable, given an intercept if necessary.
what is ui a sign for in regression
The regression error, it symbolizes omitted factors that influence Y outside X. In a scatter-plot ui is the vertical distance of dot(xi,yi) from the trendline
What is the OLS estimator
The ordinary least square estimator. It is a method of finding the coefficients of a linear regression model that creates the line that most accurately predicts the appearance of the givien data points out of all possible lines. It does this by minimizing the sum of the squared differences between the observed values and their predicted point by the line.
What are residuals in the ordinary least square estimator
The sum of the squared differences between the actual data and what the line predicts them to be.
How do you estimate a slope using the OLS estimator
You divide the covariance between X and Y by the variance of X
How do you estimate the intercept using the OLS estimator
You subtract the slope times the mean X from the mean Y
How do you calculate the residuals ui when you have estimated the the regression function using the OLS estimator
You subtract the predicted Yi from the actual Yi
What is the diference between B1 and b1 in linear regression
B1 is the true slope of the regression function while b1 is the estimated slope we get from using the OLS estimator b1 can also be written as ^B (imagine the hat on the beta). Hats in general seam to signify estimates
What two measures tell us how well the regression line fits the data provided.
The regression noted as R² which notes what fraction of the variance is explained by the regression line. Also the standard error of the regression SER which measures the magnitude of a typical regression residual in the sample (average ui).
Is R² unit free
Yes, it is unit free and ranges from 0-1 where 0 is none of the variance and 1 is all of the variance of Y being explained by the X’s.
How is R² calculated
By dividing the sum of the squares of the estimated Y subtracted by the mean Y by the sum of the squares of the actual Y’s subtracted by the mean Y. If R² is 1 than all estimates are correct, if it is 0 none of them are even close.
If R² is 0 all variance in Y is NOT explained by residuals
False it is
If There is only one X, R² is the square of the correlation between X and Y
True
How do you calculate the SER
it is the square toot of the sum of the regressions estimate residuals divided by their number subtracted by 2
What is the root mean square error of the regression
it is self explanitory. RMSE is the root of the square residual divided by their number aka the mean error of the regression.
Why divide by n-2 in SER
becouse two variables have been estimated B1 and B0 so it is a proactive moove against bias that becomes insignificant when the sample size is large.
What are the assumptions of the OLS estimator
1 That the model is linear, #2 that the residual error does not depend on the independent variable X E(u|X=x)=0. #3 There is no perfect multicollinearity aka the independent variables are not perfectly correlated. #4 large outliers are rare. #5 the sampling is independent and identically distributed i.i.d. #6 homoskedacity, aka the variance of the error is constant. #7 the error is normaly distributed.
What does no endogeneity mean
That the resudual error does not depend on the independent variable.
If there is no correlation between the independent variable and the error there is no Endogineity
No their relationship might be constant or nonlinear or there might be a third factor that influences both the dependent and independent variable which makes E(u|X=z)=0 false
If n is large the first three assumptions of OLS is enough for it to be unbiased
True
Give an example of data that is not i.i.d
Time series data
Why is timeseries data not i.i.d
Becouse a variables change through time is often dependent on its previous value.
Does the error become normally distributed in large samples
Yes becouse the Law of Large Numbers LLN
When is the OLS estimator unbiased
when E(b1) = B1
The variance of sample b1 is a measure of uncertainty
Yes
Is the values generated by the OLS estimator NOT normally distributed around B1
False at least if the sample size is large LLN is followed
^B1 is NOT an unbiased estimator of B1
False
the variance of b1 is NOT inversely proportional to the sample size
False, the larger the sample the smaller the variance of the slope estimations of the OLS estimator functions
The sampling distribution of ^B1 is simple
False, it depends on the individual cases but it becomes simpler when n is large aka it becomes approximately normal
The larger the variance of X the smaller the variance of the estimations of B1
true, as X’s variance becomes broader it fits more variations of B1 a metaphor that might be false but good for memorization