Week 3 Flashcards
What does Ceteris Paribus mean? (3)
- This is an important assumption about the coefficient of independent variables.
- When you find the coefficient of an independent variable (especially in multiple regression equations), it is assumed that there is a change in the dependent variable, given the independent variable changes.
assumption: THIS OCCURS GIVEN ALL OTHER INDEPENDENT VARIABLES ARE CONSTANT.
What is a control variable:
Not a variable of interest, but could influence the outcomes, hence we must account for them.
They are held constant in the model to identify the key independent variable’s effect better.
What is the coefficient of determination?
R^2
What does the coefficient of determination (R^2) explain? (2)
- Proportion of the variation in y that is explained by the linear combination of the x variables.
- Between 0 (no prediction) and 1 (perfect prediction 100%)
How would you increase the value of R^2
By increasing the number of relevant variables.
- The more independent variables you have, or the more explanatory the independent variables are, the higher R^2 is likely to be.
What is the adjusted coefficient of determination (adjusted-R^2)?
- A modified measure of R^2 that takes into account the number of independent variables and the sample size.
- When comparing between the models, look for the model with the highest adjusted-R^2.
What is an F-Test? (4)
- This is an assessment of the model’s goodness of fit.
- It tests for the overall significance of a model.
- This model tests whether all coefficients jointly equal zero (the null hypothesis) / or jointly differ from zero (the alternative hypothesis)
- Indicates whether your model (with all independent variables) provides a better fit than a model without your independent variables.
- It tests for the overall significance of a model.
How do you determine the significance of F-Test?
Look at F-Statistic’s p-value (significance).
How do you detect outliers using the Quick and Dirty standard deviation rule? (2)
- If the value is outside of the range of +/- 2 or 3 standard deviations from the mean, this is likely an outlier.
- The quick and dirty method can only be used with normal distributions
How do you detect outliers using the Interquartile Range Rule? (5)
- Firstly, find Q1 and Q3.
- Then, find the Interquartile Range (IQR).
- Do 1.5 x IQR = Z
- Do Q3 + 1.5(IQR), and Q1 - 1.5(IQR)
- Does the value fall out of these bounds?
If yes = Outlier
If no = Not an Outlier
You can also use the value 3 instead of 1;5 if you want to be less stringent in identifying outliers.
How should you know when to keep/remove an outlier? (3)
- What is the reason for the outlier?
- How sensitive are your results in the presence of the outlier?
- Is it a mismeasurement or error in observation?
1. When the outlier has an essential impact on your results relevant to the study, it should be kept in the data set.
2. However, if the outlier is the result of a mismeasurement or wrongful data entry or is unlikely to generally occur in a study like this, it should be removed.
3. If unsure, create a regression with vs. without the outlier, is the relationship found due to the outlier? This can be further elaborated on based on the situation. Generally if the outlier creates the relationship, then remove.
Never remove an outlier for convenience!
How do you interpret a log-transformed independent variable?
Level-log model:
- A 1% increase in X leads to an increase in Y of [b(1)/100] units.
What do you look for when comparing and explaining models and model statistics? (1, 2a, 2b, 3a, 3b, 3c, 3d, 4, 5)
- What is the dependent variable?
- a. How many models are there?
b. What is changed between each model? - Are the following included?
a. R^2 values
b. F-statistic
c. Adjusted R^2
d. Number of Observations - Does the model explain the variation in DV well? (i.e. 20-30% variation).
- What conclusion could we make from this / best model?