4A Flashcards

1
Q

Why do we need assumptions for regression analysis?

A

Any estimation method (like OLS) requires some assumptions to calculate regression coefficients and standard errors

Before you conduct a regression analysis, you should check if these assumptions are satisfied

If an assumption is violated, either the regression coefficients or the standard errors (or both) can be biased (i.e., structurally underestimated or overestimated)

Fortunately, there is usually a solution to adjust your estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Four assumptions for OLS regression

A
  1. homoscedasticity
  2. independent oberservations
  3. no large outliers
  4. normally distributed residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is homoscedasticity?

A

Variance of the Y-variable does not depend on X
OLS assumes that the variance of the Y-variable is not predicted by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is homoscedasticity a problematic assumption?

A

It is a problematic assumption because almost anything that could plausibly change Y could also change the variation in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens when this assumption is violated?

A

… fout in dia, ff vragen of hij wil aanpassen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Solution for the violation of homoscedasticity

A

The hetroscedasticity rebust standard errors can be obtained (with SPSS-syntax)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The White-test

A

The white-test tests the null-hypothesis that there is homoscedasticity.

If the White-test gies a p-value of lower than 0.05, the null-hypothesis is rejected and we conclude that the assumption of homoscedasticity has been violated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SPSS-output with heterescadasticity robust standard errors

A

This table gives you your estimates with standard errors (and p-values) that are corrected for heteroscedasticity
The regression coefficients (both unstandardized and standardized) are unchanged
The standardized coefficients (Beta) are not in this table, but you can just obtain them from the regular regression menu/syntax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

heteroscedasticity robust standard errors

A

If there is hetroscedasticity, the robust standard errors are just as good as the regular standard errors. If there is heteroscedasticity, they are better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are independent observations

A

The sample size plays an important role in the calculation of the standard errors. The reasoning is that the larger your sample is, the less likely it is to differ from the population by chance.
This reasoning only holds if each observation truly provides a new piece of infromation.
This means that there should not by clusters of observations that all have the same or similar values.
All observations should be truly independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens when the assumption of independent observations is violated?

A

When this observation is violated, the estimated standard errors will be too small
This implies that the p-value is too small and that you may incorrectly reject the null-hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to check? The assumption of independent variables?

A

Think about how the data was collected
Errors are usually independent when these two conditions are satisfied:
1. There is only one observation for each case (e.g., person/country)
2. The cases were all sampled from the same population with the same procedure

The first condition is, for example, violated in panel data where the same respondents are interviewed every year
The second condition is, for example, violated when a dataset consists of several surveys from several countries that were put together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to solve the violation of independent observations?

A

This problem can be solved by using a combination of cluster robust standard errors and control variables

Cluster robust standard errors adjust for the clustering within a unit that you specify (e.g., within countries)

In the example of the ESS, you would do two things:
Estimate cluster robust standard errors with clustering “within countries”
Add “country” as a control variable

  • Unfortunately, cluster robust standard errors are not readily available in SPSS (which is why we will not use them in this course)
    If you ever need them, you can download an add-on for SPSS or switch to another program (e.g., Stata or R)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are outliers?

The absense of large outliers

A

Outliers are observations that deviate extremely from the mean
Outliers can be detected by the z-score of an observation (e.g., how many standard deviations it is removed from the mean)

Observations with a z-score smaller than -3 or larger than +3 are commonly considered outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happends when the assumption of the outliers is violated?

A

When there are outliers in the data, these observations can have a disproportional effect on the estimates, such that the results mainly reflect the outliers and not the other cases.

This problem manifests itself both in the regression coefficients and the standard errors

This problem is especially servre whenthe outliers are extreme and/or if they constitute a large share of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to check for an outlier?

A

This assumption can be checked by simply calculating z-scores for all the variables (both X and Y) and checking if some z-scores are smaller than -3 or larger than +3

17
Q

How to solve the outliers problem?

A

If you have outliers, you typically run two analyses:
1. An analysis with the outliers included
2. An analysis with the outliers removed

If the results are (substantively) similar, you have no problem
If the results differ, you should consider carefully why the outliers exist and how you would interpret the results with or without them

Unfortunately, there is no clear answer on how to act in this situation that works in all situations
Always motivated clearly what you did

18
Q

What are normally distributed residuals?

The assumption of normally distributed residuals

A

If the sample is not normally distributed AND too small, the sampling
distributing is not normally distributed

More specifically, the residuals of the regression analysis have to be normally distributed when the sample is small

This is almost the same as saying that the Y-variable must be normally
distributed, but not quite

19
Q

What happens when this assumption is violated?

The assumption of normally distributed residuals. Only in small samples

A

When this assumption is violated, the sampling distribution is not normally distributed and the t-values, p-values, and confidence intervals do not make sense anymore.

20
Q

How do you check for the assumption

The assumption of normally distributed residuals. Only in small samples

A

We can test this with the following syntax which
o Saves the residuals of the regression analysis
o Makes a histogram of them
o Asks for a formal test of normality

21
Q

The histogram

A

The histogram compares the distribution of the data (the bars) tot he normal distribution (the curve)

22
Q

The test

the assumption of normally distributed residuals

A

Table: test of Normality
This table provedes two test for the null-hypothesis that the data is normally distributed. If the p-values are smaller than 0.05, we reject the null-hypothesis and conclude that the assumption of normality is violated.

23
Q

How to solve this assumption

The assumption of normally distributed residuals

A

You can either increase your sample size or move to other estimation
methods, which we will not discuss in this course