Week 7| Linear regression specification, estimate and assessment Flashcards
What are the two classification that regression is subjected to?
a) Based on number of independent variables, k regression is referred to as simple or multiple regression:
simple regression: k=1 there is only one independent variable (e.g GDO is regressed on employment only)
Multi regression: k> (or equal to) 2 i.e., there are at least 2 independent variables (e.g. GDP is regressed on employment and capital stock)
b) Based on functional form, f regression can be linear or nonlinear in the parameters
Linear regression: linear in the parameters (parameters are raised to one, are not multiplied and do not appear as an exponential)
Non-linear regression: Non-linear in the parameters (e.g. a parameter appears with a power of 2)
Explain what a simple linear regression and multi linear regression is. What happens when a model is linear in the independent variables too (k>1)?
Simple linear regression: is linear in the parameters and has only 1 independent variable (k=1)
Multiple linear regression: A generalization of simple linear regression, i.e. it is linear in the parameters and has more than 1 independent variables (k>1)
If model is also linear in the independent variables (X1,X2,…,Xk), then the population multiple linear regression model is:
Y=Beta0 + Beta1X1 + Beta2X2 + … + Betak*Xk +error
Explain each of the components in the model, Y=Beta0 + Beta1X1 + Beta2X2 + … + Betak*Xk +error. What effects does the left out variables have on Y?
What is the conditional expected value of error? How can this be expressed in terms of the population regression model?
Y: dependent variable (quantitative, explained variable/regress and)
Xj: jth independent variable (quantitative, explanatory variable/regressor)
Error: Unobservable random error or disturbance term which represents the net effect of all the variables other than X1, X2, … Xk that influence Y
It is assumed that the left out variables are of minor importance, some have positive while others have negative effects on Y and on average their combined net effect is zero
-> The conditional value of error E(error | X1, X2, …, Xk) =0
-> Population regression function:
E(Y| X1, X2, …, Xk) = Beta0 + Beta1*X1 + Beta2 *X2 + … +Betak *Xk
The conditional expected value of Y i.e. the mean value of the Y sub-population associated with some given set of values of the independent variables
What is Beta0 and BetaJ? What does Beta1 and Beta2 measure (give examples)?
Beta0: y-intercept parameter - the expected value of Y when all Xj = 0
Betaj (j = 1,…,k): slope parameter of Xij - it measures the impact of a one-unit increase in Xj on the conditional expected value of Y, granted that all other independent variables in the model are kept constant
By keeping the other independent variables constant, we aim to separate the individual effect of Xj on E(Y | X1, X2, … Xk) from the combined effect of all other included independent variables
For example: Beta1 measures the change in E(Y| X1, X2,… Xk) associated with a one unit in X1 holding X2,…Xk constant
For example Beta2 measures the change in E(Y|X1,X2,…,Xk) associated with a one unit increase in X2 holding X1, X3,…XK) constant
What 3 possibilities are there regarding Betaj (hint: the correlation between variables)? What happens if the population parameters are unknown?
If Betaj = 0, E(Y) and Xj are not related to each other
If betaj >0 there is a positive linear relationship between E(Y) and Xj.
If beta<0 there is a negative linear relationship between E(Y) and Xj
If the population parameters are unknown then the error term cannot be observed and we know at most a few elements (maybe just one, often none) of each sub-population of Y
Beta0, Beta1,…, Betak must be estimated from a sample of corresponding observations of the independent and dependent variables
(y 1, x 11, x 21, …, x k1), (y 2, x 12, x 22, …, x k2), … , (y n, x 1n, x 2n, …, x kn)