Nonparametric and semiparametric estimation Flashcards

Question 1

Q

Basic. What is a non parametric model?

Answer

A

In a non-parametric model, we assume as little as possible. E.g.,

$$
y =m(x)+\epsilon
$$

Where $m(.)$ is an unspecified function of $x$. This is thus a non-parametric regression model of the CDF.

We need a lot of data to use non-parametric estimation.

A histogram is actually a nonparametric estimator of the density of our variable of interest. If we like to get something smoother, we would use the kernel density estimator.

Question 2

Q

How dost it work when we estimate a non-parametric model?

Answer

A

When estimating a non-parametric regression model, a local weighted regression line at each point $x
$ is fitted using centered subsets that include the closest $h\times N$ observation. Where $h$ is the bandwidth and $N$ is the sample size. The weights decline as we move away from $x$.

Question 3

Q

What is a kernel?

Answer

A

The kernel estimate is a weighted average of observations within the bandwidth at the current point of evaluation. Data closest to the current point of evaluation are given more weight, as specified by a function called the kernel.

This is kind of a moving average of our data set (density)

Question 4

Q

How do we think about bandwidth in the context of kernels

Answer

A

Bandwidth = h

The bandwidth decides how much data we use in our moving average. Using more data will create a smoother estimate.

Choosing the smallest bandwidth leads to a jagged density estimate while the largest bandwidth overs smooths the data.

Question 5

Q

What is the point of estimating a density function? ANd what estimators can we use?

Answer

A

The point of these types of estimations is to estimate the density of $f(x_0)$ of $x$ evaluated at some point $x_0$. For this we can use:
- The histogram estimator (like a uniform kernel)
- The Kernel density estimator

Question 6

Q

Formulate the histogram estimator and describe its parts

Answer

A

See notion.

Where $h$ is the bandwidth and $N$ is the sample size. This estimator gives all observations in $x_0 \pm h$ equal weight. This leads to a density estimate that is a step function, even if the underlying density is continuous.

Question 7

Q

Formulate the general kernel estimator.

Answer

A

See notion.

Question 8

Q

What needs to be true for a kernel function?

Answer

A

The kernel function $K(·)$ must be continuous, symmetric
around zero, and integrate to unity.

Question 9

Q

What is the most important thing when it comes to non-parametric estimation. The kernel or the BW?

Answer

A

In practice the choice of kernel is not a huge deal, the choice of BW is more important.

Question 10

Q

What can be said about the mean obtained from the kernel density estimator?

Answer

A

The kernel density estimator is biased, with the bias term $ b(x_0)$ that depends on the bandwidth, the curvature of the true density, and the kernel used. The bias disappears asymptotically if $h\to0$ as $N \to \infin$.

Question 11

Q

What is the kernel density bias?

Answer

A

See notion.

Question 12

Q

What can be said about the mean obtained from the kernel density estimator?

Answer

A

The variance disappears if $Nh → ∞$, which requires that while $h → 0$, it must do so at a slower rate than $N → ∞$.

Question 13

Q

What is the bias-variance trade-off when using kernel density?

Answer

A

The choice of bandwidth $h$ is much more important than the choice of kernel function $K(·)$.

There is a tension between setting $h$ small to reduce bias and setting $h$ large to ensure smoothness. A natural metric to use is some form of the mean-squared error (MSE).

We would therefore find a way to optimally chose the bandwidth. This is done by minimizing some function of the integrated standard error (ISE), e.g., $E(ISE(h))$.

The optimal BW ($h$) goes to zero as $N \to \infin$.

Question 14

Q

What is non-parametric regression?

Answer

A

another interesting application of nonparametric methods is in the estimation of a regression function:

y_i =m(x_i)+\epsilon_i

Since the functional form of $m(x_i)$ is unspecified, we can not use OLS. Instead we use the sample average:

\hat m(x_0) = \sum_i w_{i0,h}y_i

where $w_{i0,h}$ are local weights.

The estimator is unbiased, but for consistency we need $N_0 → ∞$ as $N → ∞$ so that the variance goes to zero.

Question 15

Q

What can be said about the bias-variance trade of in nonlinear regression

Answer

A

Here we also have a bias-variance tradeoff. As $h$ becomes smaller $\hat m(x_0)$ becomes less biased, as only observations close to $x_0$ are being used, but more variable, as fewer observations are being used.

Question 16

Q

What are the features of the local constant kernel regressor?

Answer

Study These Flashcards

A

This estimator is known as the local constant estimator, as it assumes that $m(x)$ is constant in a neighborhood of $x_0$. Again, $h$ is the bandwidth and determines how many of the $x_i$s around $x_0$ are used in forming the average.

If $h = \infin$ we get the sample average of $y$

if $h = 0$ we only get one observation,

For the formula. See Notion. It is what we derive PS.

Question 17

Q

What is the boundary problem in kernel regressions?

Answer

Study These Flashcards

A

Since this type of regression is basically a moving average. We will have fewer observations at the endpoints, thus we will have over and under-estimations here. This problem is reduced when using local linear (or local polynomial) regression.

Question 18

Q

What do we assume in the local linear kernel regression?

Answer

Study These Flashcards

A

We assume that $m(x)$ is linear in the neighbourhood of x_0

Question 19

Q

What can be said about the bias in the local linear regression?

Answer

Study These Flashcards

A

b({x_0})=\frac{1}{2}h^2m’‘(x_0)\int z^2 K(z)dz

This estimator does not have a bias in the slope of $m$. This is especially beneficial for overcoming boundary problems. Therefore, we use this in the RDD design.

Question 20

Q

What is penalised regression?

Answer

Study These Flashcards

A

We are in a setting with an ordinary linear model and lot of regressors. We then might want some guidance of hoe to solve the right righthand side variables. This problem of model selection is an application of machine-learning.

We then used penalised regression models which adds a penalazation term to the OLS objective function.

These models will help us finding out which variables are “irrelevant” to explain our outcome and which instead we should keep.

This thus not help us solve any problem regarding causal inference, but about prediction. What regressors best predicts our outcome.

Lasso and Ridge regression will punish coefficients too far from zero.

Question 21

Q

How should we think about kernel in multivariate settings?

Answer

Study These Flashcards

A

We can generalize the kernel density to a multivariate setting. However, as the dimension of x gets large, it is likely that we incur problems of sparseness (not enough observations in a neighborhood of$ x_0$).

Question 22

Q

State the uniform, triangular and Epanechnikov kernels

Answer

Study These Flashcards

A

Uniform: K(z) = 1/2*1(|z|<1)

Triangular: K(z) = (1-|z|)* 1(|z|<1)

Epanechnikov kernel: K(z) 3/4(1-z^2)*1(|z|<1)

Question 23

Q

What are the important features of the kernel

Answer

Study These Flashcards

A

\int K(z) dz = 1

\int zK(z)dz= 0

Also \int z^3 k(z)dz = 0

Question 24

Q

Under which condition does the bias in the kernel estimator disapere?

Answer

Study These Flashcards

A

When m(x_0) is a constant?

Nonparametric and semiparametric estimation Flashcards

(24 cards)