Week 9 - Population Linear Regression Model and Point Estimation Flashcards
What is Conditional Probability? (simple definition)
Probability of a certain event occurring, given that another event has already occurred
Pr(E|F) reads the conditional) probability of E given F
eg. the chance of rain or snow depends on the current weather condition
- the conditional probability of rain given gray skies has already occurred…..
eg. what is the conditional probability of drawing an individual who cast a ballot for the liberals, given that this individual voted in this election? (and then you would be able to calculate it based off of the numbers of those who voted for each party
What is Joint Probability Distribution?
It tells you the probability of of drawing a certain combination of X and Y
eg. what is the probability of drawing a certain combination of Vote Choice (Y) and Province (X)?
denoted by: p(X, Y) = Joint probability distribution of X and Y
BASICALLY IF WE ARE INTERESTED IN THE CONDITIONAL PROBABILITY OF TWO EVENTS HAPPENING TGT BUT WE HAVEN’T CONDITIONED IT ON A SPECIFIC VALUE OF X YET (looks like a large 3D histogram
What is Conditional Probability Distribution
the conditional probability distribution tells you the probability drawing a certain value of Y or certain range of values at Y when X is fixed as a specific value
denoted by: p(Y|X)
conditional probability distribution of Y given X
a conditional probability distribution of Y given X may be considered as a slice of a joint probability distribution
**MUST REMEMBER THAT PROBABILITIES SUM TO ONE
eg. what is the conditional probability vote choice, given that they lived in Prairies?
(remember you can get a range of values if there are multiple values of Y at X and they should all sum to one)
What is the difference between Discrete and Continuous variables?
Discrete variables can only take specific separate values
eg. number of students in a classroom (1, 2, 3 etc..)
Continuous variables can take any value in a range
eg. a person’s height (172.3cm, 172.4cm…ect..)
what is the method if statistical inference for Linear Regression
the Population Linear Regression Model
Once a random sample is drawn from the population, the variables Y and X become random variables - True or false?
True, if we draw a random sample, the population conditional distribution of Y given X becomes the conditional probability distribution of Y given X
**note that the key difference is that probability can be visualized using density
The conditional probability distribution of Y given X can also be called the…
Conditional expectations or means (E(Y|X)
which is the value of Y we would expect on average given a particular value of X
we model these conditional means by linear regression
what is the population linear regression model
E(Y|X) = α + βX
β is the slope/ amount of change in E(Y|X) when X increases by one unit
What is the error term/disturbance term
denoted by u, it is the difference between the actual value of Y for each individual in the population (denoted by Yi), and the conditional expectation of Y for the value of X for each individual (denoted by E(Y|Xi)
why is ui a random variable
it is a random variable because Yi is a random variable
Why are OLS estimators appropriate to use to construct the linear relationship between Y and X in the population
Under appropriate assumptions the OLS estimators are unbiased and consistent estimators of the population parameters alpha and beta
what are the four assumptions of regression
- Popular linear regression E(Y|X) = α + βX is a correctly specified model for E(Y|X)
- its a good approximation - Shape of p(u|X)
- if there are a relatively small number of observations, we have to assume the shape of the conditional probability distribution of an error term is a normal distribution.
- when its large, there is no assumption - Homoscedasticity
- the standard deviation (variance) of p(u|X) or, the mean of the error term, equally spread out across all values of X - which is good, it means that our confidence intervals are trustworthy - random sample, or IID sample
- everyone in the population has the same probability of being included in a sample
the mean of the sampling distribution of β-hat is the same as the popular linear regression coefficient. true or false?
True! if it is the MEAN β-hat, it is after repeated sampling and comparing distributions of those answers which should give us the right one on average
as the sample size (N) increases, the variability (standard error) of β-hat decreases. true or false?
true! because β-hat gets closer and closer t the β in the population as N increases
explain why we can derive the confidence interval from a normal distribution (two points)
- when the sample size N is large, the sampling distribution of β-hat can be approximated by a normal distribution
- when the sample size N is small, we need to assume that the shape of p(u|X) in the population is a normal distribution
what are the two components of the population linear regression model of Y on X
E(Yi|Xi) = α + βXi is a systematic component in the sense that this component models the variation of Y systematically explained by the variation of X (= how the value of Y changes, on average, as the value of X changes) in the population.
ui = Yi – E(Yi|Xi) models the variation of Y that is not systematically explained by the variation of X in the population. From the viewpoint of drawing a random sample from the population, this value is a random draw from the conditional probability distribution, p(u|X), and therefore, may be considered as a “random” component of the model.
when you are asked to write “a population linear regression model”, what are the components to remember?
eg. E(Trudeaui | Ideologyi) = β0 + β1 Ideologyi
Linear Regression model is:
E(Yi|Xi) = β0 + β1Xi
eg. when X is ideology and Y is feeling therm on Trudeau it would be:
E(Trudeaui | Ideologyi) = β0 + β1 Ideologyi
1) Define and explain the variables used in place of X and Y
eg. Trudeau = the feeling thermometer about Justin Trudeau (0 = least favorable feeling, 100 = most favorable feeling), and
Ideology = the political ideology score (0 = most liberal, 10 = most conservative).
2) define and explain what subscript i refers to (it will most likely refer to each individual in the population/whatever the unit of focus is)
eg. subscript refers to each individual in the Canadian Population
3) How do you substantively read the left side?
eg. E(Trudeaui | Ideologyi) is the value of the feeling thermometer about Trudeau that we expect, on average, in the population given a particular ideological position (i.e., a specific value of Ideologyi).
4) break down the components of the right side of the equation. (what is it altogether, what is β0, what is β1Xi)
eg. the conditional mean is modelled by a straight line, β0 + β1 Ideologyi
β0 is an intercept of the linear regression, which represents the value of E(Trudeaui | Ideologyi) when Ideologyi = 0. That is, β0 is the score of the feeling thermometer we expect, on average, for the most liberal individuals (Ideologyi = 0)
β1 is the coefficient on the ideology variable (= Ideology), which represents the change in E(Trudeaui | Ideologyi) with respect to a one unit increase in Ideologyi. In other words, β1 models how the value of the feeling thermometer changes, on average, in the population when an individual’s political ideology becomes more conservative by one unit in the 11-point ideological scale.
what is the difference between a population linear regression model, and a population linear regression model using an error term
using the error term ui, we can rewrite the population linear regression model in terms of the actual value of Y (Yi), instead of its conditional means (E(Yi|Xi).
eg. Trudeaui = β0 + β1 Ideologyi + ui.
where ui = Trudeaui – E(Trudeaui | Ideologyi) = Trudeaui – using the error term ui, we can rewrite the population linear regression model in terms of the actual value of Y (Yi), instead of its conditional means (E(Yi|Xi).
eg. Trudeaui = β0 + β1 Ideologyi + ui.
where ui = Trudeaui – E(Trudeaui | Ideologyi) = Trudeaui – (β0 + β1 Ideologyi).).
(imagine “Trudeaui” to be the blue circle point, and “(β0 + β1 Ideologyi)” to be the estimate on the line of best fit)
explanation:
The idea is that by tracing E(Trudeau | Ideology), which is the typical value of the thermometer given a specific ideological position, across different values of the ideology, we can identify how the value of thermometer systematically varies as the ideological position changes.
In a given random draw of an individual from the population, the value of Trudeaui would differ from this expected value E(Trudeaui | Ideology*) by chance. This difference is represented by an error term ui and considered as the component of Trudeaui that cannot be systematically explained by this individual’s ideology.
This second expression of a linear regression model like the one below is the specification of a model you usually see in actual research articles.
Trudeaui = β0 + β1 Ideologyi + ui.