Clustering and bootstrap Flashcards

1
Q

What is meant by iid and why is it important?

A

The most important assumption we do in cross-sectional inference is that observations are iid: Each observation $i$ is treated as a random draw from the same population, and is therefore uncorrelated with the one before and after. However, in most cases, the iid assumption is too restrictive, as there is usually some form of dependence on the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What will happen if we assume iid med we have correlations within clusters?

A

If we assume that observations are iid but, in reality, they are not, we get wrong s.e. (usually, too low). This affects the t-stat.

Thus, we will tend to reject the null too often. I.e., make type-1 errors.

Often dependence arises when we have grouped data. That is, our sampled individuals belong to some group which makes them similar thus their unobservable variables are likely correlated, so they are not iid. This could be e.g.,

  • Wages for workers in the same firm
  • Observation for the same individual over time (panel data)

This creates a clustering problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does cluster robust standard errors change our SE en point estimates?

A

Using heterogeneous robust or cluster robust SE’s does not change the point estimates, only the SE’s and so the significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When do cluster-robust variance (SE) become higher than hetero-robust variance (SE) `

A

When we have correlations within clusters
- The higher correlation we have in each cluster
- The more observations we have in the same cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Show the cluster robust variance formula

A

\widehat{Var}{clu} = (\bold{X’X})^{-1}\bold{\hat B}{clu}(\bold{X’X})^{-1}

where

\bold{\hat B}{clu} = \sum{g=1}^G\bold{X’_g\hat u_g\hat u_g’X_g}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When asymptotic theory apply with cluster robust errors?

A

For the asymptotic theory to apply, we need the number of clusters G to be large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the Moulton factor?

A

The Moulton factor tells us how much we over-estimate precision by ignoring intra-class correlation. This is given by

$$
\frac{\widehat{Var}_{clu}[\hat\beta]}{\widehat {Var}[\hat \beta]}
$$

if we get e.g., 7 as a result. we should take $\widehat{SE} \times \sqrt 7$ to get the correct standard errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What to cluster over? When should and shouldn’t we use it?

A

One should cluster at the level where one thinks both regressors and errors might be correlated within cluster. However, we need also to keep in mind that we only approach the true variance as $G \to \infin$.

Hence, if we define very large clusters, there will be few clusters to average over, and the resulting estimated clustered variance will be a poor estimate of the true variance.

As a general rule, cluster at a progressively higher (broader) level and see if s.e. change significantly. Be conservative and show the largest s.e. However, we need a clear argument of we choose to cluster at the level we do.

If the clustering on a higher level do not yield significantly larger errors, we should use the errors on a lower level.

→ The bottom line is: use the cluster estimator if you can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How should one think regarding clustering and fixed effects?

A

If you included cluster-specific fixed effects, αg and you believe they capture all the cluster correlation, you may not cluster. I But often some within-cluster correlation remains and the error may still be heteroskedastic.
→ The bottom line is: use the cluster estimator if you can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is multiway clustering and what do we need to think about?

A

Imagine you have state-year panel of $N$ individuals and you worry about clustering across states and over time. A solution is to use a two-way clustering estimator.

In the example before, we need both the number of states $S$ and of time periods $T$ to be large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the issue if we have too few clusters?

A

When $G$ is small, the CRVE underestimates the true variance. Then the t-statistic is too
large and the t-test over-rejects the null. So actually, clustering at a higher level could infact decreese our standard errors compared to clustering at a lower level.

The solution is to cluster at the higher and correct level, but use Bootstrap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bootstrap and how does it work?

A

The idea behind the bootstrap is to consider the sample we have as if it were the population of interest. Instead of drawing more samples of size N from the population distribution, the bootstrap draws with replacement from the original sample itself, using the empirical c.d.f. of the data as if it were the population distribution. For each bootstrap “sample” $s
$ we will obtain a different estimate of the parameter of interest, $\hat \theta_s$. After several resamples, we get a distribution of $\hat \theta$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does bootstrap work?

A

In bootstrap sampling, we draw samples with replacement. In each bootstrap some original data points appear more than once while others do not appear at all. For each sample we obtain a estimate of our parameter of interest. We then estimate the variance of our parameter using the empirical variance calculated using the sample estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the bootstraped variance formula?

A

See notion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we need for bootstrap to work?

A

What we sample from needs to be iid!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is clustered bootstrap and what do we need to think about?

A

When the number of clusters G is small, the asymptotic approximation that the cluster-robust variance estimator uses doesn’t work well.

Instead of using the asymptotic approximation, we can do bootstrap.

The bootstrap distribution of $β$ will be a better approximation to the sampling distribution of $β$ than the asymptotic approximation when $G$ is small.

We thus compute the clustered robust standard errors using the bootstrapped variance.

Important! Since bootstrap needs to be done over iid observations, we can not bootstrap over observations when we have clustering. We should instead use the clusters as our observations bootstrap over our clusters. That is, we should sample the clusters. In this way, we allow for correlation within clusters but rule out correlation between clusters.

17
Q

When does and doesen’t bootstrap work?

A

Bootstrap works well with iid data and smooth estimators (with continuous and differentiable objective function).

When data are not iid, or with non smooth estimators (such as the median, or in nonparametric estimators), bootstrap does not work.