04_Survey research - analysis regression Flashcards
What is biavariate/multivariate regression analysis?
bivariate: 1 IV and 1 DV
Mutlivariate: n IV and 1 DV
What is the goal of a bivariate regression analysis?
Goal: to find a straight line that best represents a linear association between the two variables
What is the OLS method?
Ordinary least square method:
- Minimization of the squared distance between the regression line and y values
What is the deviation explained and unexplained?
Deviation explained
= predicted - average
–>amount the model is better compared to simply using the average
Deviation unexplained:
= actual - predicted
What is the interpretation of the central goodness of fir, coefficient determination R^2?
R^2: % of the variation of the dependent variable that can be explained through the independent variable(s) (“variance accounted for”)
What are the paralls between F-test (R^2) and ANOVA?
- Variance explained by the model = variance between groups
- Variance not explained by the model = Variance within groups
What are the two levels of goodness of fit?
Goodness of fit:
- Global goodness of fit: overall model
- Local goodness of fit: single model parameters
What is the global level of fit?
Global level: Fit of the whole regression model
- F-test: does overall model explain any variance of the DV?
- R^2: how much variance of the DV is explained by the model?
- Adjusted R^2: to what degree does the proportion of explained variance depend on the model size?
What is the local level of fit?
Local level:
- t-test: is there any relationship between the IV and DV in the population?
- Interpretation of the sign of the regression corefficients
- Comparision of coefficients by means of standardized regression coefficients
Why do we need a adjusted R^2?
Key disadvantage of coeffiecent of determination R^2: it increases, even if irrelevant independent variables are added to the model
–>Desired: simple model:
–>Adjusted R^2: accounts for number of parameters –>punishes more complex model
Why do we have to standardize regression coefficients?
reason: size of regression coefficient depends on the scale of the chosen variable
–>Problem: difficult to compare the influence of IV
–>Solution: standardize regression coeffiecents beta
What are the key implications of the relationship between F-test, R^2, and sample size?
key implications
- The larger the sample, the smaller the minimum R^2 (coefficient of determination) of a significant regression model
- The more regression coefficients p, the higher the minimum coefficient of determination R^2, or the larger the sample to achieve a significant regression model
What are the 3 most important OLS-method assumptions?
(A assumptions)
1.No relevant IVs are missing and the considered IVs are not irrelevant
–>omitted variables: variables that are relevant for the model, but are not included
2.True relationship between the IV and DV is linear
3.Estimated parameters are constant over all observations
.
What is the isse of the 3.case of omitted variables?
Case 3: relationship with both DV and other IVs:
we should account for these variables –>otherwise we have a omitted variable bias
–>Simpson´s paradox-
Implications:
>unpredictable bias of model parameters and corresponding hypothesis tests
- Serious problem when the researcher is interested in making conclusions about causality
- less problematic when prediction is the primary goal
- Extent of the problem depends on the relationship between the variables:
What is the first case of omitted variables?
(Omitted variable)
Case 1: Variable has no Relationship wiht other IVs or DV
- Implications of omitting varibales: tend to be positive, since irrelevant IVs reduce the precision of hypothesis tests
- Can be excluded –>no benefit