Lecture 1 - Introduction to Modelling Flashcards
What is a simple linear regression and its features?
A regression which models the relationship between a dependent y and and independent x variable
A straight line which fits through a set of n points in such a way that it makes the sum of squared residuals (E(u)^2 - E here is supposed summation sign) if the model as small as possible
What is the equation of the simple linear regression and the estimated model?
yi = b0 + b1xi + ui
After the assumption that the expected value of u given x is 0 because the sum of squared residuals is minimised ideally to 0, the equation becomes:
ŷi = b0 + b1xi
Note that both b’s should also have hats on top of them to represent that the model is estimated given the assumption that E(u I x) = 0 (expected value of u given a value of x is equal to 0) - the x does not have a hat on top of it
The new estimated model is the same as the initial model apart from the addition of hats (to show the estimated values) and the lack of u (residual) because it is assumed to be 0
How does a multiple regression differ and why is it used?
It used when you have multiple factors effecting your dependent variable y - but same logic applies
Instead of a straight line, it is a plane that is fitted (with multiple dimensions) so that the sum of the squared residuals is minimised - (to picture what a multiple regression might look like think of a graph with 3 dimensions)
State an example of the equation of an estimated multiple regression model and the similarities to the simple regression model
ŷi = b0 + b1xi + b2zi
Note that all the b’s should have a hat on top of them to show that the value is estimated and that each variable should be multiplied by its own b value
As in the simple regression model, the marginal effect of x is b1 and the marginal effect of z is b2 - the constant (b0) is also still the value of y when x and z both equal 0
Go over how to run a regression on stata to get the values for your regression equation
Couldn’t make out exactly - heard something like red or something just check out and see if we need to know
How can you find your values of b0, b1 and b2 … using a set of stata results and how do you interpret b0, b1 and b2?
The furthest left column called coefficients will have your values of b0, b1 and b2 - the 3rd row first column figure labelled constant is your b0, the figure in the first row and first column labelled x is your b1 and your second row first column figure labelled z is your b2 - follow the same pattern for any subsequent figures
y = b0 + b1x + b2z + u
b0 is your y when your x and z is equal to 0
when x increases by a unitary change 1, then y will change by b1
when z increases by a unitary change 1, then y will change by b2
What is a dummy variable?
Also known as a binary variable - it takes the value of either 1 or 0 - so 1 means that some condition is satisfied and 0 means otherwise
The condition that is associated with the 0 is called the base or reference level because it is the level against which the other level (1) is compared
How would make a dummy variable such that 0 is associated with domestic price of cars and 1 is associated with foreign price of cars part of a linear regression model?
How would you interpret b1?
Price (y) = b0 + b1 * Foreign + u
b1 is the increase/decrease (difference or change in average price) of foreign cars compared to domestic cars
How can you see the interpretation that b1 is the difference in average price of foreign cars compared to do domestic cars?
For foreign cars, the value is given as 1 so by substitution into the equation:
E(Price) = b0 + b1 * Foreign = b0 + b1*1 = b0 + b1
For domestic cars, Foreign = 0 (note here that foreign is equal to 0 because only the variables that are assigned the number 1 in the dummy variable enter the model so to show the base/reference level we just give 0 to the value of level assigned to 1):
E(Price) = b0 + b1 * Foreign = b0 + b1*0 = b0
Change in price is there the difference between between these 2 equations:
Change in (triangle) price = b0 + b1 - b0 = b1
How would you include multiple dummy variables into a regression model?
State the equation for two dummy of the following dummy variables to see the effect on wage of single and married people respectively and interpret your b’s:
Single - 1
Otherwise - 0
Married - 1
Otherwise - 0
Very similar to having just one:
Here dependent variable is Wage …
Wage = b0 + b1 * Single + b2 * Married + u
b1 is the change in average wage of single individuals compared to the base level other
b2 is the change in the average wage of married individuals compared to the base level other
u is your residual (error) - what remains - minimised because optimally 0
b0 is the average wage of an other individual
How can you use the model Wage = b0 + b1 * Single + b2 * Married + u to see interpretations of b0, b1 and b2?
For an other individual both Single = 0 and Married = 0 …:
E(Wage) = b0 + 0 + 0 = b0
For a single individual Single = 1 and Married = 0:
E(Wage) = b0 + b1 + 0 = b0 + b1
For a married individual Single = 0 and Married = 1:
E(Wage) = b0 + 0 + b2 = b0 + b2
Change in (triangle) wage for single individuals: b0 + b1 - b0 = b1
Change in (triangle) wage for married individuals: b0 + b2 - b0 = b2
What do you need to remember when deriving the normal regression model of dummy variables and and when interpreting the values of the b’s by manipulating the model?
You must include a ‘u’ in the general model for the residual (error) but not in the actual calculation using the model with mathematically deriving the interpretation of the b’s
What are interaction effects and are they important in economic analysis?
Yes they are important - they capture the effect of 2 variables working in combination
Give an example of interaction effects
Drinking is fun and driving is fun however drinking and driving together is not a good combination - this is what interaction effects capture
What can you have interactions between?
Between continuous variables, dummy variables and between continuous and dummy variables
State an example of an interaction between 2 continuous variables
The interaction between age and number of years in schooling are two continuous variables (time - which can be broken down a infinite number of times) which have an effect on the wage earned
How can we form an equation with two continuous variables (age and education) and wage showing the effect age and education together on wage as well as their individual effect on wage?
Note that age is continuous because of time and education also because of time (showing age and education as time)
Wage = b0 + b1age + b2education + b3ageeducation + u
What does b3 show in the equation: Wage = b0 + b1age + b2education + b3ageeducation + u where age and education are two continuous variables?
b3 shows whether the combination of age and education produces higher wages than each of the two factors alone
How would you interpret/find b3 quantitatively in the equation: Wage = b0 + b1age + b2education + b3ageeducation + u when you assume age to be fixed and education to be variable - find the difference in wage when age is fixed and education = 12 in one instance and 13 in another
Wage = b0 + b1age + b2education + b3ageeducation + u
Hold age constant e.g. at its mean (show by drawing horizontal line above the word age) and find the difference of having 1 extra year of education on wage
Take away the wage earned from 13 years of education from 12 years so that:
(b0 + b1age + b213 + b3age13) - (b0 + b1age + b212 + b3age12) = b2 + b3*age
Working:
13b2 - 12b2 = b2
13b3age - 12b3age = b3*age
Note that age above should have horizontal line above it to indicate that it’s fixed