04_Survey research - analysis regression Flashcards

1
Q

What is biavariate/multivariate regression analysis?

A

bivariate: 1 IV and 1 DV

Mutlivariate: n IV and 1 DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of a bivariate regression analysis?

A

Goal: to find a straight line that best represents a linear association between the two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the OLS method?

A

Ordinary least square method:
- Minimization of the squared distance between the regression line and y values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the deviation explained and unexplained?

A

Deviation explained
= predicted - average
–>amount the model is better compared to simply using the average

Deviation unexplained:
= actual - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the interpretation of the central goodness of fir, coefficient determination R^2?

A

R^2: % of the variation of the dependent variable that can be explained through the independent variable(s) (“variance accounted for”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the paralls between F-test (R^2) and ANOVA?

A
  • Variance explained by the model = variance between groups
  • Variance not explained by the model = Variance within groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two levels of goodness of fit?

A

Goodness of fit:
- Global goodness of fit: overall model
- Local goodness of fit: single model parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the global level of fit?

A

Global level: Fit of the whole regression model
- F-test: does overall model explain any variance of the DV?
- R^2: how much variance of the DV is explained by the model?
- Adjusted R^2: to what degree does the proportion of explained variance depend on the model size?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the local level of fit?

A

Local level:
- t-test: is there any relationship between the IV and DV in the population?
- Interpretation of the sign of the regression corefficients
- Comparision of coefficients by means of standardized regression coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we need a adjusted R^2?

A

Key disadvantage of coeffiecent of determination R^2: it increases, even if irrelevant independent variables are added to the model

–>Desired: simple model:

–>Adjusted R^2: accounts for number of parameters –>punishes more complex model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we have to standardize regression coefficients?

A

reason: size of regression coefficient depends on the scale of the chosen variable

–>Problem: difficult to compare the influence of IV

–>Solution: standardize regression coeffiecents beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the key implications of the relationship between F-test, R^2, and sample size?

A

key implications

  • The larger the sample, the smaller the minimum R^2 (coefficient of determination) of a significant regression model
  • The more regression coefficients p, the higher the minimum coefficient of determination R^2, or the larger the sample to achieve a significant regression model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 most important OLS-method assumptions?
(A assumptions)

A

1.No relevant IVs are missing and the considered IVs are not irrelevant

–>omitted variables: variables that are relevant for the model, but are not included

2.True relationship between the IV and DV is linear

3.Estimated parameters are constant over all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

.

What is the isse of the 3.case of omitted variables?

A

Case 3: relationship with both DV and other IVs:
we should account for these variables –>otherwise we have a omitted variable bias
–>Simpson´s paradox-

Implications:
>unpredictable bias of model parameters and corresponding hypothesis tests

  • Serious problem when the researcher is interested in making conclusions about causality
  • less problematic when prediction is the primary goal
  • Extent of the problem depends on the relationship between the variables:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the first case of omitted variables?

(Omitted variable)

A

Case 1: Variable has no Relationship wiht other IVs or DV
- Implications of omitting varibales: tend to be positive, since irrelevant IVs reduce the precision of hypothesis tests
- Can be excluded –>no benefit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the 2.Case of omitted variables?

(Omitted variable)

A

Case 2:
Variable only has a Relationship wiht DV

  • Implications of omitting varibales: Parameters are unbiased, can be excluded –>regression coefficient stays the same
  • if included –>contribues to larger R^2 and indirectly lower standard error –>more conservative
17
Q

What is the 4.case of omitted varibales

(Omitted variable)

A

Case 4: Variable only has a Relationship wiht other IVs

Implications of omitting varibales:

  • can be excluded, it decreases the regression coefficient, referred to as surpressor, leads to more conservative estimates
  • Possibility that estimated relationships between the other IVs and the DV are too small
18
Q

What are the B assumptions (4-7) in regression analysis?

A

4.the expected value of the residualls is zero
5.residual variance is constant acrosss all observations
6.Residuals are uncorrelated
7.Residuals are normally distributed (if heteroskedasticity we need to account for this)

–>Possible problem: omitted variables and outliers

19
Q

What are the C assumptions (8-10) in regression analysis?

A

8.Residuals are uncorrelated with IV
9.There is no perfect linear relationship between the IV
–>Problem: Multicolinearity

10.Measurement is error free

Possible problem: Omitted variables and Multicolinearity

20
Q

What is the problem with Multicollinearity (Nr.9 of the assumption in regression analysis?))

A

Assumption: There is no perfect linear relationship between IV variables

Problem: also arise in the case of stron,g but not perfect relationships:
- very problematic if correlations are higher than 0.9
- test should alsways be conducted if correlations are higher than 0.7

21
Q

What are the consequences of the Multicollinearity problem?

A

Consequences Multicollinearity:
- reduced statistical power
- reduced precision of parameter estimation
- Results are highly sensitive to small changes in the underlying data

22
Q

What are the 3 deteection methods to detect multicollinearity?

A

1.Rules of thumb:
- Varying and implausible signs of the regression coefficients
- High R^2 despite many non-significant regression coefficients
- Standardized coefficients >1
- Bivariate correlation > 0.9

2.Methods relying on the decomposition of variance:
- Tolerance < 0.10 (Proportion of variance of IV that is not explained by te other IVs

  • Variance inflation factor (VIF):
    VIF > 10 or (> 2.5)

3.Factor-analytic methods

23
Q

Potential strategies with Multicollinearity

A

1.Do nothin –>RUles of thumb
- t-values of coefficient >2
- R^2 of the model higher than R^2 of alternative models

2.Use additional information
- collect new data
- model the relationship between the IVs
- analyze the impact of the sum of the problematic variables

3.transform the data:
- Ridge-regression (constant is added to all elements of the covariance matrix)

4.Use other methods to assess the relative importance of the IV
- methods that allow researchers to determiine the relative importance of the IV even if multicollinearity is present
–>more robust methods

5.Use stability tests:
- repeated model estimation based n slightly smaller subsample iof the orginal data: iif the parameter estimates remain stable
- leave out single IVs

24
Q

What is an outlier and influential obersatiions?

A

Outliers: An outlier is defined with respect to the distribution; that is it ‘lies out’ of the usual range of data”

Influential observations: A case may be judged influential, if important features of the analysis are altered substantially when it is deleted from the data

25
Q

What are the consequences of outlers and influential observations?

A

Consequences:
- Unpredictable bias of parameter estimates
- Unpredictable bias of test statistics
- Overestimation of the proportion of the explained variance of the dependent variable

26
Q

You conducted a regession, and want to include a new IV variable Age: with a mean of 25, and a standard deviation of 0 –>is this usefule in this case?

A

Not useful

  • because age has 0 standard deviation, thus is only a constant
  • by simply adding a constant might increase R^2 but does not add predictive value to the modell