04_Survey research - analysis regression Flashcards by Lukas Fahrländer

What is biavariate/multivariate regression analysis?

bivariate: 1 IV and 1 DV

Mutlivariate: n IV and 1 DV

How well did you know this?

Not at all

Perfectly

What is the goal of a bivariate regression analysis?

Goal: to find a straight line that best represents a linear association between the two variables

How well did you know this?

Not at all

Perfectly

What is the OLS method?

Ordinary least square method:
- Minimization of the squared distance between the regression line and y values

How well did you know this?

Not at all

Perfectly

What is the deviation explained and unexplained?

Deviation explained
= predicted - average
–>amount the model is better compared to simply using the average

Deviation unexplained:
= actual - predicted

How well did you know this?

Not at all

Perfectly

What is the interpretation of the central goodness of fir, coefficient determination R^2?

R^2: % of the variation of the dependent variable that can be explained through the independent variable(s) (“variance accounted for”)

How well did you know this?

Not at all

Perfectly

What are the paralls between F-test (R^2) and ANOVA?

Variance explained by the model = variance between groups
Variance not explained by the model = Variance within groups

How well did you know this?

Not at all

Perfectly

What are the two levels of goodness of fit?

Goodness of fit:
- Global goodness of fit: overall model
- Local goodness of fit: single model parameters

How well did you know this?

Not at all

Perfectly

What is the global level of fit?

Global level: Fit of the whole regression model
- F-test: does overall model explain any variance of the DV?
- R^2: how much variance of the DV is explained by the model?
- Adjusted R^2: to what degree does the proportion of explained variance depend on the model size?

How well did you know this?

Not at all

Perfectly

What is the local level of fit?

Local level:
- t-test: is there any relationship between the IV and DV in the population?
- Interpretation of the sign of the regression corefficients
- Comparision of coefficients by means of standardized regression coefficients

How well did you know this?

Not at all

Perfectly

Why do we need a adjusted R^2?

Key disadvantage of coeffiecent of determination R^2: it increases, even if irrelevant independent variables are added to the model

–>Desired: simple model:

–>Adjusted R^2: accounts for number of parameters –>punishes more complex model

How well did you know this?

Not at all

Perfectly

Why do we have to standardize regression coefficients?

reason: size of regression coefficient depends on the scale of the chosen variable

–>Problem: difficult to compare the influence of IV

–>Solution: standardize regression coeffiecents beta

How well did you know this?

Not at all

Perfectly

What are the key implications of the relationship between F-test, R^2, and sample size?

key implications

The larger the sample, the smaller the minimum R^2 (coefficient of determination) of a significant regression model
The more regression coefficients p, the higher the minimum coefficient of determination R^2, or the larger the sample to achieve a significant regression model

How well did you know this?

Not at all

Perfectly

What are the 3 most important OLS-method assumptions?
(A assumptions)

1.No relevant IVs are missing and the considered IVs are not irrelevant

–>omitted variables: variables that are relevant for the model, but are not included

2.True relationship between the IV and DV is linear

3.Estimated parameters are constant over all observations

How well did you know this?

Not at all

Perfectly

What is the isse of the 3.case of omitted variables?

Case 3: relationship with both DV and other IVs:
we should account for these variables –>otherwise we have a omitted variable bias
–>Simpson´s paradox-

Implications:
>unpredictable bias of model parameters and corresponding hypothesis tests

Serious problem when the researcher is interested in making conclusions about causality
less problematic when prediction is the primary goal
Extent of the problem depends on the relationship between the variables:

How well did you know this?

Not at all

Perfectly

What is the first case of omitted variables?

(Omitted variable)

Case 1: Variable has no Relationship wiht other IVs or DV
- Implications of omitting varibales: tend to be positive, since irrelevant IVs reduce the precision of hypothesis tests
- Can be excluded –>no benefit

How well did you know this?

Not at all

Perfectly

What is the 2.Case of omitted variables?

(Omitted variable)

Study These Flashcards

Case 2:
Variable only has a Relationship wiht DV

Implications of omitting varibales: Parameters are unbiased, can be excluded –>regression coefficient stays the same
if included –>contribues to larger R^2 and indirectly lower standard error –>more conservative

What is the 4.case of omitted varibales

(Omitted variable)

Study These Flashcards

Case 4: Variable only has a Relationship wiht other IVs

Implications of omitting varibales:

can be excluded, it decreases the regression coefficient, referred to as surpressor, leads to more conservative estimates
Possibility that estimated relationships between the other IVs and the DV are too small

What are the B assumptions (4-7) in regression analysis?

Study These Flashcards

4.the expected value of the residualls is zero
5.residual variance is constant acrosss all observations
6.Residuals are uncorrelated
7.Residuals are normally distributed (if heteroskedasticity we need to account for this)

–>Possible problem: omitted variables and outliers

What are the C assumptions (8-10) in regression analysis?

Study These Flashcards

8.Residuals are uncorrelated with IV
9.There is no perfect linear relationship between the IV
–>Problem: Multicolinearity

10.Measurement is error free

Possible problem: Omitted variables and Multicolinearity

What is the problem with Multicollinearity (Nr.9 of the assumption in regression analysis?))

Study These Flashcards

Assumption: There is no perfect linear relationship between IV variables

Problem: also arise in the case of stron,g but not perfect relationships:
- very problematic if correlations are higher than 0.9
- test should alsways be conducted if correlations are higher than 0.7

What are the consequences of the Multicollinearity problem?

Study These Flashcards

Consequences Multicollinearity:
- reduced statistical power
- reduced precision of parameter estimation
- Results are highly sensitive to small changes in the underlying data

What are the 3 deteection methods to detect multicollinearity?

Study These Flashcards

1.Rules of thumb:
- Varying and implausible signs of the regression coefficients
- High R^2 despite many non-significant regression coefficients
- Standardized coefficients >1
- Bivariate correlation > 0.9

2.Methods relying on the decomposition of variance:
- Tolerance < 0.10 (Proportion of variance of IV that is not explained by te other IVs

Variance inflation factor (VIF):
VIF > 10 or (> 2.5)

3.Factor-analytic methods

Potential strategies with Multicollinearity

Study These Flashcards

1.Do nothin –>RUles of thumb
- t-values of coefficient >2
- R^2 of the model higher than R^2 of alternative models

2.Use additional information
- collect new data
- model the relationship between the IVs
- analyze the impact of the sum of the problematic variables

3.transform the data:
- Ridge-regression (constant is added to all elements of the covariance matrix)

4.Use other methods to assess the relative importance of the IV
- methods that allow researchers to determiine the relative importance of the IV even if multicollinearity is present
–>more robust methods

5.Use stability tests:
- repeated model estimation based n slightly smaller subsample iof the orginal data: iif the parameter estimates remain stable
- leave out single IVs

What is an outlier and influential obersatiions?

Study These Flashcards

Outliers: An outlier is defined with respect to the distribution; that is it ‘lies out’ of the usual range of data”

Influential observations: A case may be judged influential, if important features of the analysis are altered substantially when it is deleted from the data

What are the consequences of outlers and influential observations?

**Consequences**: - **Unpredictable bias** of parameter estimates - **Unpredictable bias of test statistics** - **Overestimation of the proportion** of the **explained variance of the dependent variable**

You conducted a regession, and want to include a new IV variable Age: with a mean of 25, and a standard deviation of 0 -->**is this usefule in this case**?

**Not useful** - because age has **0 standard deviation**, thus is only a constant - by simply **adding a constant** might increase **R^2** but does **not add predictive** value to the modell

04_Survey research - analysis regression Flashcards

(26 cards)