Lecture 1 - Introduction to Modelling Flashcards by Aneesh Singh

What is a simple linear regression and its features?

A regression which models the relationship between a dependent y and and independent x variable

A straight line which fits through a set of n points in such a way that it makes the sum of squared residuals (E(u)^2 - E here is supposed summation sign) if the model as small as possible

How well did you know this?

Not at all

Perfectly

What is the equation of the simple linear regression and the estimated model?

yi = b0 + b1xi + ui

After the assumption that the expected value of u given x is 0 because the sum of squared residuals is minimised ideally to 0, the equation becomes:

ŷi = b0 + b1xi

Note that both b’s should also have hats on top of them to represent that the model is estimated given the assumption that E(u I x) = 0 (expected value of u given a value of x is equal to 0) - the x does not have a hat on top of it

The new estimated model is the same as the initial model apart from the addition of hats (to show the estimated values) and the lack of u (residual) because it is assumed to be 0

How well did you know this?

Not at all

Perfectly

How does a multiple regression differ and why is it used?

It used when you have multiple factors effecting your dependent variable y - but same logic applies

Instead of a straight line, it is a plane that is fitted (with multiple dimensions) so that the sum of the squared residuals is minimised - (to picture what a multiple regression might look like think of a graph with 3 dimensions)

How well did you know this?

Not at all

Perfectly

State an example of the equation of an estimated multiple regression model and the similarities to the simple regression model

ŷi = b0 + b1xi + b2zi

Note that all the b’s should have a hat on top of them to show that the value is estimated and that each variable should be multiplied by its own b value

As in the simple regression model, the marginal effect of x is b1 and the marginal effect of z is b2 - the constant (b0) is also still the value of y when x and z both equal 0

How well did you know this?

Not at all

Perfectly

Go over how to run a regression on stata to get the values for your regression equation

Couldn’t make out exactly - heard something like red or something just check out and see if we need to know

How well did you know this?

Not at all

Perfectly

How can you find your values of b0, b1 and b2 … using a set of stata results and how do you interpret b0, b1 and b2?

The furthest left column called coefficients will have your values of b0, b1 and b2 - the 3rd row first column figure labelled constant is your b0, the figure in the first row and first column labelled x is your b1 and your second row first column figure labelled z is your b2 - follow the same pattern for any subsequent figures

y = b0 + b1x + b2z + u
b0 is your y when your x and z is equal to 0
when x increases by a unitary change 1, then y will change by b1
when z increases by a unitary change 1, then y will change by b2

How well did you know this?

Not at all

Perfectly

What is a dummy variable?

Also known as a binary variable - it takes the value of either 1 or 0 - so 1 means that some condition is satisfied and 0 means otherwise

The condition that is associated with the 0 is called the base or reference level because it is the level against which the other level (1) is compared

How well did you know this?

Not at all

Perfectly

How would make a dummy variable such that 0 is associated with domestic price of cars and 1 is associated with foreign price of cars part of a linear regression model?

How would you interpret b1?

Price (y) = b0 + b1 * Foreign + u

b1 is the increase/decrease (difference or change in average price) of foreign cars compared to domestic cars

How well did you know this?

Not at all

Perfectly

How can you see the interpretation that b1 is the difference in average price of foreign cars compared to do domestic cars?

For foreign cars, the value is given as 1 so by substitution into the equation:
E(Price) = b0 + b1 * Foreign = b0 + b1*1 = b0 + b1

For domestic cars, Foreign = 0 (note here that foreign is equal to 0 because only the variables that are assigned the number 1 in the dummy variable enter the model so to show the base/reference level we just give 0 to the value of level assigned to 1):
E(Price) = b0 + b1 * Foreign = b0 + b1*0 = b0

Change in price is there the difference between between these 2 equations:
Change in (triangle) price = b0 + b1 - b0 = b1

How well did you know this?

Not at all

Perfectly

How would you include multiple dummy variables into a regression model?

State the equation for two dummy of the following dummy variables to see the effect on wage of single and married people respectively and interpret your b’s:
Single - 1
Otherwise - 0

Married - 1
Otherwise - 0

Very similar to having just one:

Here dependent variable is Wage …

Wage = b0 + b1 * Single + b2 * Married + u

b1 is the change in average wage of single individuals compared to the base level other
b2 is the change in the average wage of married individuals compared to the base level other
u is your residual (error) - what remains - minimised because optimally 0
b0 is the average wage of an other individual

How well did you know this?

Not at all

Perfectly

How can you use the model Wage = b0 + b1 * Single + b2 * Married + u to see interpretations of b0, b1 and b2?

For an other individual both Single = 0 and Married = 0 …:
E(Wage) = b0 + 0 + 0 = b0

For a single individual Single = 1 and Married = 0:
E(Wage) = b0 + b1 + 0 = b0 + b1

For a married individual Single = 0 and Married = 1:
E(Wage) = b0 + 0 + b2 = b0 + b2

Change in (triangle) wage for single individuals: b0 + b1 - b0 = b1

Change in (triangle) wage for married individuals: b0 + b2 - b0 = b2

How well did you know this?

Not at all

Perfectly

What do you need to remember when deriving the normal regression model of dummy variables and and when interpreting the values of the b’s by manipulating the model?

You must include a ‘u’ in the general model for the residual (error) but not in the actual calculation using the model with mathematically deriving the interpretation of the b’s

How well did you know this?

Not at all

Perfectly

What are interaction effects and are they important in economic analysis?

Yes they are important - they capture the effect of 2 variables working in combination

How well did you know this?

Not at all

Perfectly

Give an example of interaction effects

Drinking is fun and driving is fun however drinking and driving together is not a good combination - this is what interaction effects capture

How well did you know this?

Not at all

Perfectly

What can you have interactions between?

Between continuous variables, dummy variables and between continuous and dummy variables

How well did you know this?

Not at all

Perfectly

State an example of an interaction between 2 continuous variables

The interaction between age and number of years in schooling are two continuous variables (time - which can be broken down a infinite number of times) which have an effect on the wage earned

How well did you know this?

Not at all

Perfectly

How can we form an equation with two continuous variables (age and education) and wage showing the effect age and education together on wage as well as their individual effect on wage?

Note that age is continuous because of time and education also because of time (showing age and education as time)

Wage = b0 + b1age + b2education + b3ageeducation + u

How well did you know this?

Not at all

Perfectly

What does b3 show in the equation: Wage = b0 + b1age + b2education + b3ageeducation + u where age and education are two continuous variables?

b3 shows whether the combination of age and education produces higher wages than each of the two factors alone

How well did you know this?

Not at all

Perfectly

How would you interpret/find b3 quantitatively in the equation: Wage = b0 + b1age + b2education + b3ageeducation + u when you assume age to be fixed and education to be variable - find the difference in wage when age is fixed and education = 12 in one instance and 13 in another

Wage = b0 + b1age + b2education + b3ageeducation + u

Hold age constant e.g. at its mean (show by drawing horizontal line above the word age) and find the difference of having 1 extra year of education on wage

Take away the wage earned from 13 years of education from 12 years so that:

(b0 + b1age + b213 + b3age13) - (b0 + b1age + b212 + b3age12) = b2 + b3*age

Working:
13b2 - 12b2 = b2
13b3age - 12b3age = b3*age

Note that age above should have horizontal line above it to indicate that it’s fixed

How well did you know this?

Not at all

Perfectly

In the equation: Wage = b0 + b1age + b2education + b3ageeducation + u when you assume education to be fixed and age to be variable - find the difference in wage when education is fixed and age = 50 in one instance and 51 in another

Study These Flashcards

Wage = b0 + b1age + b2education + b3ageeducation + u

Hold education constant e.g. at its mean (show by drawing horizontal line above the word education) and find the effect of being 1 year older on wage

Take away the wage earned from being 51 years old from 50 years old so that:

(b0 + b151 + b2education + b351education) - (b0 + b150 + b2education + b350education) = b1 + b3*education

Working:
51b1 - 50b1 = b1
51b3education - 50b3education = b3*education

Note that education above should have horizontal line above it to indicate that it’s fixed

What are the 2 main groups within which variables are split into?

Study These Flashcards

Quantitative and Qualitative (Categorical) variables - quantitative variables can be measured and assigned a numerical value (describes quantity) and qualitative variables take on a limited, and usually fixed, number of possible values (groupings) and are not described as numbers

Quantitative examples - arithmetic operations make sense (the numbers mean something I.e a 2 year is twice as old as a 1 year old) - height, weight, age, time (continuous) and no. of books, dice roll, no. of people, country population, pulse rate (discrete) - quantitative variable is an umbrella term under which continuous and discrete variables fall
Qualitative examples - education level, race, sex, colour etc

What can quantitative variable be split into? Explain.

Study These Flashcards

Into continuous and discrete variables

Continuous variables - infinite number of values because each continuous variable can be broken down an infinite number of times e.g. year then month then day then hour then minute then second then millisecond etc - not the same cannot be said for all quantitative variables as some are discrete which don’t follow the same description therefore remember that quantitative variables are the umbrella term for all numeric variables (variables which can be described numerically) e.g. height, weight, temperature, length, time

Discrete variable - takes on distinct, countable, specific values e.g. country population, no of people in hospital, number of books (previous 3 all examples of count variable), dice roll result etc

What can categorical (qualitative) variables be split into? Explain.

Study These Flashcards

Nominal and ordinal variables

Nominal variables - a categorical/qualitative variable in which the data cannot be ordered - can be numeric in nature but cannot have any numeric properties (a number may be used to represent a category - the category itself is not numeric and has no numeric properties) e.g. blood type, gender, race, eye colour etc

Ordinal variables - variables which have natural, ordered categories where the distances between the categories is not known e.g. income level: low income, middle income, high income ; education level: high school Bachelors Masters PhD - order matters here

What is a count variable?

Study These Flashcards

The most typical representation/type of discrete variables (falls under discrete variables) - take specific (usually integer values) e.g. country population, no of kids, no of visits to library etc

What can nominal variables be split into? Explain.

Binary (dummy) variables and many categories Binary/dummy variables - variable which takes on only two values e.g. smoke status: smoker or non-smoker or ethnicity: white or non-white Many categories - variable takes on more than two values e.g. ethnicity: white, black, asian etc

What can ordinal variables be split into? Explain.

Binary and many categories Binary/dummy variable - here a variable can take only two values but the order matters e.g. pass or fail (did you pass first or fail first etc) OR undergraduate or postgraduate (undergraduate is done before post grad) Many categories - more than 2 variables again where order matters as still ordinal - health status: poor, fair, good, very good, excellent or strongly disagree, disagree, neutral, agree, strongly agree or ratings 1-5

State an example of an interaction between a continuous variable and a dummy variable

The interaction between age (continuous variable) and sex (dummy variable) and there effect both together and individually on wage

How can we form an equation with a continuous variable (age), a dummy variable (sex) and wage showing the effect age and sex together on wage as well as their individual effect on wage?

Note that age is continuous because of time and sex is a dummy because it can only take the values of 0 or 1 (0 for female and 1 for male) Wage = b0 + b1*sex + b2*age + b3*sex*age + u

What does b3 show in the equation: Wage = b0 + b1*sex + b2*age + b3*sex*age + u where age is a continuous variable and sex is a dummy variable?

b3 shows the difference in the average wage for one extra year for males (therefore you assume age to equal 1 and take on the category for males) REMEMBER that b3 shows the change/difference in average wage when increasing age by 1 for males and not the average wage itself

Find the average wage for a male of average age and the average wage for a female of average age

Wage = b0 + b1*sex + b2*age + b3*sex*age + u **Wage for male of average age:** Sex = 1 - male E(Wage) = b0 + b1 + b2*age + b3*age **Wage for female of average age:** Sex = 0 - female E(Wage) = b0 + b2*age Age should have horizontal line on top of it to indicate average for males and females

State an example of an interaction between two dummy variables

The interaction between sex (dummy variable) and degree (dummy variable) and there effect both together and individually on wage Sex is simply male = 1 and female = 0 Degree is have a degree = 1 and no degree = 0

How can we form an equation with a two dummy variables (sex and degree) and wage showing the effect sex and degree together on wage as well as their individual effect on wage?

Wage = b0 + b1*degree + b2*sex + b3*degree*sex + u

What does b3 show in the equation: Wage = b0 + b1*degree + b2*sex + b3*degree*sex + u where degree and age are both dummy variables?

b3 shows the difference in the average wage for males with a degree (I.e both degree and sex assumed to equal 1) REMEMBER that b3 shows the change/difference in average wage when increasing age by 1 for males and not the average wage itself

What is important to remember about the characteristics of the base individual?

When a model has only dummies, the base individual has the base level characteristics of all the dummies in the model (I.e all the 0 values) When a model has a continuous variable included (potentially with dummies) then the continuous variable is taking as its average when the characteristics for the base individual

Find the average wage for a male with and without a degree and the average wage for a female with and without a degree State what you can do with this information

By substituting either 1 or 0: Male with degree: E(Wage) = b0 + b1 + b2 + b3 Male without degree: E(Wage) = b0 + b2 Female with degree: E(Wage) = b0 + b1 Female without degree: E(Wage) = b0 By taking the differences you can find the change in the average wage of male without and without a degree, a female with and without a degree and even the difference in average for males and females both with and without a degree

How would you interpret the model: y = b0 + b1*x + u?

b1 is the slope of the function and is interpreted as the change in y for a unitary change in x

How can the basic model: y = b0 + b1*x + u change?

It can be logarithmically transformed in 4 ways

How many ways can the model y = b0 + b1*x + u be transformed logarithmically?

4 ways including its current from known as level-level

What are the ways in which the model y = b0 + b1*x + u can be transformed?

Level-level - what the model currently is - the base model Log-level - y is a log and x is normal Level-log - y is normal and x is a log Log-log - both y and x is logged

What is a log-level transformation?

The y becomes logged so that equation is now log(y) = b0 + b1*x + u Here a unitary change in x causes a 100*b1 percentage change in y This is called a semi elasticity as percentages only exist on one side of the equation (y side)

What is a log-log transformation?

Here both the y and x are logged so that the base equation becomes log(y) = b0 + b1*log(x) + u Here a percentage change in x causes a b1 percentage change in y Known as a common elasticity where percentages exist on both sides

What is a level-log transformation?

Here only x is logarithmically transformed so that the equation would be something like y = b0 + b1*log(x) + u Here a percentage change in x causes a b1/100 change in y

How do you denote the average value of something e.g. the average the wage and what do you need to remember about this?

E(Wage) Always use when doing any sort of calculation/manipulation of the equation to find the average wage or the difference in it

State whether the following statement is true or false: The nature of variables that we have as covariates (continuous variables which affect the dependent variable e.g. age) determine the type of model that is used

False, the nature of variables that you have as covariates determine the interpretation of coefficients but it does not influence the model used but rather the dependent variable affect the model used

State whether the following statement is true or false: Binary, ordinal and count variables are not usually used as y variables in regressions

False, they can be used but just that we haven’t come across any regressions that have used them yet but will do apparently in the future - some use binary variables as the dependent variable - an example is a logic model which does so

In the regression Y = 2 - 3*X, what is the effect on E(Y) of an increase in X by 2 units?

Note that in a level-level model with no logs a unitary change in x causes a b1 change in y therefore a unit increase in x cause a change of 3 in y - because as x gets bigger a larger number is taken away from 2, an increase in x will cause y to fall As a one unit increase in X causes a fall in y by 3, a 2 unit increase in X causes a 6 unit decrease in y

In the regression Wage = 2 + 3*D_1st + 1.5*D_2_1 + 1.2*D_2_2, where D’s are categorical variables for degree classification, what is the effect on mean/average wage of having achieved a 2.1 compared to a 1st and 3rd respectively?

Compared to achieving a 1st, someone who has achieved a 2.1 will earn £1.5 less on average than someone with a 1st Compared to achieving a 3rd, someone who has achieved a 2.1 will earn £1.5 more on average than someone with a 3rd - because someone with a third will simply earn 2 and 2 is added to all degree classifications as a base b0

In the regression ln(Y) = 0.5 + 0.2*ln(X), where X is a continuous variable, how do you interpret the coefficient of ln(X)?

Remember this is a log-log transformation (a common elasticity) with percentages on both sides: A percentage increase in X will cause a 0.2% increase in Y

Lecture 1 - Introduction to Modelling Flashcards

(48 cards)