Multiple regression Flashcards
What does multiple regression tell us?
“How does the mean of the (DV) change as a function of IVs? Can partial the effect of each IV
- Ceteris Paribus “Keeping everything else constant the effect of this IV on DV is = xxx” Search for a correlation between the variables rather than causality → should be reflected in the wording of the hypothesis.
What does multiple regression do?
Tells us how the mean of DV change as a function of IVs: can partial the effect of each IV
What type of scale does the DV need to be in multiple regression?
Dependent variable: Y in equation needs to be interval or ratio scale (continuous variable). It changes the mean of the DV with whatever the values on the right-hand side do (DV and IVs can be constructs of scales e.g. anxiety
What is ceteris paribus?
• Ceteris paribus: able to differentiate the specific effect of the independent variable (So the effect of that IV when everything else is held constant inside the regression – So this is also depending of that is in the regression)
What estimation method do we use in multiple regression?
Estimation:
• Ordinary least squares (OLS) and how to assess the model: method of fitting our regression slope where we estimate the betas so the “sum of squared residuals” (SSR) are minimized. So our error term is as low as possible
o Residuals are the values between the predicted line and the observed values - we want residuals as small as possible as it indicates that the model represents the data well = a good fit If the SSR value is large, then our model does not really represent the data that well.
What is the principle of mean centering?
Mean centering:
Is just a transformation of data and the slope remains the same, and we have not changed the betas, when doing this. The idea is that it is sometimes easier to compare with the mean/average. We are therefore interested in how e.g. our salary changes if we are one year older or younger than the average employee.
- Move the point of reference to the mean as shown below
What is the principle of mean centering?
Mean centering:
Is just a transformation of data and the slope remains the same, and we have not changed the betas, when doing this. The idea is that it sometimes easier to compare with the mean/average. We are therefore interested in how e.g. our salary changes if we are one year older or younger than the average employee.
- Move the point of reference to the mean as shown below
What can we use standardized coefficients for?
Standardization of coefficient: Means that we subtract the mean and divide it by standard deviation
o We do this if we want to compare the coefficients to each other
o Interpretation: 1 unit increase in standard deviation of “xx” IV will increase the XX in standard deviation
o Can compare the effect IV’s to each other when it is standardized = We can see which one have the largest effect.
How do we interpret the p-value?
p-value=0,05 (the lowest significance level) and at what value do we confirm or reject the null-hypothesis.
What is the principle of goodness-of-fit ?(Model summary table in a regression)
Model summary table
o R^2 (R square) = Goodness of fit = SSM / SST → Looking at the overall quality of the model - How much DV variance is explained by our models’ independent variables → R^2 is used as a goodness of fit measure It is a percentage between 0 and 100%
- Small catch: Because it is calculated as it, our R^2 tend to increase every time we add an IV more. Therefore we need to calculate the adj. R^2
o Adjusted R^2 b = Penalizes for each IV added to the model - This is used to explain the actual difference between the blocks/models
- Like R^2 it judges the overall quality of our model
- Tells us whether variables added last increase the explanatory power of the model
- Better than normal R^2 as it penalizes the model when one more IV is added
- Tells us how much variance is explained by our model
o Durbin-watson: A measurements that indicate if we have problems with correlated residuals = Less than 1,5 and higher than 2,5 - Will indicate problems
What is a confidence interval?
Confidence interval
- Puts the coefficient into perspective
- Calculated as +/- 1,96 * standard error = Upper and lower boundary around of coefficient
How can we test for significance and what are the principles behind t-statistics and p-values?
Significance
• Is our coefficient significantly different from 0? H0: Beta_1=0. We test this by using t-statistics/ t-test - follow a T-student distribution - It is calculated by the value: unstandardized Beta_1 divided by the standard errors of Beta_1
- Standard errors = How sure are we on our date. The standard error tells you how accurate the mean of any given sample from that population is likely to be compared to the true population mean. When the standard error increases, i.e. the means are more spread out, it becomes more likely that any given mean is an inaccurate representation of the true population mean. Usually, we see that standard errors decrease if we increase the amount of data
o S.E = deviation of the coefficient’s accuracy divided by sample size
T-statistics in regression is based on the variance of the coefficient beta estimate. How do I know if the t-statistic is big enough? We do that by looking at the p-value (significance)
o P-value: Given the observed t-statistics what is the smallest significance level at which H0 would be rejected
o *** p-value<0,0001 (0,1%) & ** p-value<0,01 (1%) & * p-value < 0,05 (5%)
o Significance level tells about the probability of being wrong – Probability of being wrong should below
Significance level and types of errors
• Type 1 error: We reject H0 although it is true → With a CI that is 95% we should only do this 5% of the times