4A Flashcards

Question 1

Q

Why do we need assumptions for regression analysis?

Answer

A

Any estimation method (like OLS) requires some assumptions to calculate regression coefficients and standard errors

Before you conduct a regression analysis, you should check if these assumptions are satisfied

If an assumption is violated, either the regression coefficients or the standard errors (or both) can be biased (i.e., structurally underestimated or overestimated)

Fortunately, there is usually a solution to adjust your estimates

Question 2

Q

Four assumptions for OLS regression

Answer

A

homoscedasticity
independent oberservations
no large outliers
normally distributed residuals

Question 3

Q

What is homoscedasticity?

Answer

A

Variance of the Y-variable does not depend on X
OLS assumes that the variance of the Y-variable is not predicted by X

Question 4

Q

Why is homoscedasticity a problematic assumption?

Answer

A

It is a problematic assumption because almost anything that could plausibly change Y could also change the variation in Y

Question 5

Q

What happens when this assumption is violated?

Answer

A

… fout in dia, ff vragen of hij wil aanpassen

Question 6

Q

Solution for the violation of homoscedasticity

Answer

A

The hetroscedasticity rebust standard errors can be obtained (with SPSS-syntax)

Question 7

Q

The White-test

Answer

A

The white-test tests the null-hypothesis that there is homoscedasticity.

If the White-test gies a p-value of lower than 0.05, the null-hypothesis is rejected and we conclude that the assumption of homoscedasticity has been violated.

Question 8

Q

SPSS-output with heterescadasticity robust standard errors

Answer

A

This table gives you your estimates with standard errors (and p-values) that are corrected for heteroscedasticity
The regression coefficients (both unstandardized and standardized) are unchanged
The standardized coefficients (Beta) are not in this table, but you can just obtain them from the regular regression menu/syntax

Question 9

Q

heteroscedasticity robust standard errors

Answer

A

If there is hetroscedasticity, the robust standard errors are just as good as the regular standard errors. If there is heteroscedasticity, they are better

Question 10

Q

What are independent observations

Answer

A

The sample size plays an important role in the calculation of the standard errors. The reasoning is that the larger your sample is, the less likely it is to differ from the population by chance.
This reasoning only holds if each observation truly provides a new piece of infromation.
This means that there should not by clusters of observations that all have the same or similar values.
All observations should be truly independent.

Question 11

Q

What happens when the assumption of independent observations is violated?

Answer

A

When this observation is violated, the estimated standard errors will be too small
This implies that the p-value is too small and that you may incorrectly reject the null-hypothesis

Question 12

Q

How to check? The assumption of independent variables?

Answer

A

Think about how the data was collected
Errors are usually independent when these two conditions are satisfied:
1. There is only one observation for each case (e.g., person/country)
2. The cases were all sampled from the same population with the same procedure

The first condition is, for example, violated in panel data where the same respondents are interviewed every year
The second condition is, for example, violated when a dataset consists of several surveys from several countries that were put together

Question 13

Q

How to solve the violation of independent observations?

Answer

A

This problem can be solved by using a combination of cluster robust standard errors and control variables

Cluster robust standard errors adjust for the clustering within a unit that you specify (e.g., within countries)

In the example of the ESS, you would do two things:
Estimate cluster robust standard errors with clustering “within countries”
Add “country” as a control variable

Unfortunately, cluster robust standard errors are not readily available in SPSS (which is why we will not use them in this course)
If you ever need them, you can download an add-on for SPSS or switch to another program (e.g., Stata or R)

Question 14

Q

What are outliers?

The absense of large outliers

Answer

A

Outliers are observations that deviate extremely from the mean
Outliers can be detected by the z-score of an observation (e.g., how many standard deviations it is removed from the mean)

Observations with a z-score smaller than -3 or larger than +3 are commonly considered outliers

Question 15

Q

What happends when the assumption of the outliers is violated?

Answer

A

When there are outliers in the data, these observations can have a disproportional effect on the estimates, such that the results mainly reflect the outliers and not the other cases.

This problem manifests itself both in the regression coefficients and the standard errors

This problem is especially servre whenthe outliers are extreme and/or if they constitute a large share of the sample

Question 16

Q

How to check for an outlier?

Answer

Study These Flashcards

A

This assumption can be checked by simply calculating z-scores for all the variables (both X and Y) and checking if some z-scores are smaller than -3 or larger than +3

Question 17

Q

How to solve the outliers problem?

Answer

Study These Flashcards

A

If you have outliers, you typically run two analyses:
1. An analysis with the outliers included
2. An analysis with the outliers removed

If the results are (substantively) similar, you have no problem
If the results differ, you should consider carefully why the outliers exist and how you would interpret the results with or without them

Unfortunately, there is no clear answer on how to act in this situation that works in all situations
Always motivated clearly what you did

Question 18

Q

What are normally distributed residuals?

The assumption of normally distributed residuals

Answer

Study These Flashcards

A

If the sample is not normally distributed AND too small, the sampling
distributing is not normally distributed

More specifically, the residuals of the regression analysis have to be normally distributed when the sample is small

This is almost the same as saying that the Y-variable must be normally
distributed, but not quite

Question 19

Q

What happens when this assumption is violated?

The assumption of normally distributed residuals. Only in small samples

Answer

Study These Flashcards

A

When this assumption is violated, the sampling distribution is not normally distributed and the t-values, p-values, and confidence intervals do not make sense anymore.

Question 20

Q

How do you check for the assumption

The assumption of normally distributed residuals. Only in small samples

Answer

Study These Flashcards

A

We can test this with the following syntax which
o Saves the residuals of the regression analysis
o Makes a histogram of them
o Asks for a formal test of normality

Question 21

Q

The histogram

Answer

Study These Flashcards

A

The histogram compares the distribution of the data (the bars) tot he normal distribution (the curve)

Question 22

Q

The test

the assumption of normally distributed residuals

Answer

Study These Flashcards

A

Table: test of Normality
This table provedes two test for the null-hypothesis that the data is normally distributed. If the p-values are smaller than 0.05, we reject the null-hypothesis and conclude that the assumption of normality is violated.

Question 23

Q

How to solve this assumption

The assumption of normally distributed residuals

Answer

Study These Flashcards

A

You can either increase your sample size or move to other estimation
methods, which we will not discuss in this course

4A Flashcards

(23 cards)