Topic 4: Non-parametric methods and bootstrap Flashcards
Describe the jackknife principle
We can use the Jackknife to assess the uncertainty for a real-value statistics, such as the mean.
We define the real-value statistic as hat θ = s(x), so it is just a function of x.
The core idea is to consider how much a single element in the sample affects the estimate of hat θ. (Leave-one-out cross-validation).
We could write the Jackknife estimate of the standard error as such:
EQUATION
https://docs.google.com/document/d/1_z06ButEgfbbXJBirkbQrzsobuFkwlva4BW1XOD5e60/edit?tab=t.0
Features about jackknife:
- It is nonparametric, so it doesn’t assume your data follows any specific distribution (like normal, poisson, etc.)
- You don’t need to make choices about parameters or settings
- Hidden assumption of smooth behaviour across sample sizes: It assumes that small changes in your data lead to small changes in your results
- Upwardly biassed estimate of the standard error: it tends to overestimate the uncertainty in your calculations
- Closely connected to the Taylor series method with the difference
having numerically computed directional derivatives: But instead of using mathematical derivatives, it uses actual data to estimate changes
Describe the Bootstrap problem
The bootstrap principle is a statistical method for estimating the distribution of a sample statistic by resampling with replacement from the original dataset. It is particularly useful when making inferences about population parameters or estimating the variability of an estimator when the theoretical distribution is unknown or hard to derive.
EQUATION
https://docs.google.com/document/d/1N_Tu5vwK4HJyd0j_zSsY73pd_qmDs-lWJ1SiJiZoGbg/edit?tab=t.0
Describe empirical bootstrapping
We have no knowledge about the distribution from which the random sample is from, F.
We know that the Empirical Distribution Function (EDF) is a good approximation for the true distribution function.
We use the EDF to define hat F. In practice this means that we sample uniformly at random, WITH REPLACEMENT from the dataset.
The original estimate is obtained in two steps:
EQUATION
Application of Empirical Bootstrap:
https://docs.google.com/document/d/1cT3bXH4aGHKcoCMVdcCNL0-ldr-2wcbjsepzN40WqtY/edit?tab=t.0
We can evaluate the probabilities by approximating it using bootstrapping, to create the bootstrapped samples.
Describe Parametric bootstrapping
Suppose we are willing to assume that the observed data vector x comes from a parametric family F. Parametric bootstrap is as a way to estimate uncertainty by creating simulated datasets based on your original data’s estimated parameters.
EQUATION
https://docs.google.com/document/d/1UqwLI1D-6TmSrOrkMpeJ2Z1s0U9Y98QO9k3DsKRYst4/edit?tab=t.0
Describe the Influence function and
its relation to robust estimation
The influence function measures the sensitivity of an estimator to small changes in the data. It assesses how much a single observation affects the estimator.
EQUATION
https://docs.google.com/document/d/1SQXeB83e6I05tQo6WjzXKm8PwDFZ3_Oz6DA3uHr6Cz4/edit?tab=t.0
The influence function for the mean shows that the influence of a single point $x$ on the mean is simply the deviation of $x$ from the mean $(x-\theta)$.
If $x$ is very far from $\theta$ (e.g., an outlier), the influence is large, indicating that the mean is sensitive to outliers (this is because $T(F) = \theta$)
EQUATION
So, when a small contamination is introduced to the data, such as another point, $x,$ we add the weight to that $x$. The influence function tells us how much the estimator $T$ changes when the small contamination is introduced.
- Large influence: This means it’s highly sensitive to observations at $x$, and it’s therefore non-robust (unbounded influence)
- Small influence/Bounded influence: This means it’s less sensitive to extreme values, meaning that the estimator is more robust.
The sample mean suffers from an unbounded influence function, which grows as $x$ moves farther away from $\theta$ (the sample mean).
The influence function helps in designing estimators that are less sensitive to outliers. Robust estimation theory seeks estimators $\hat\theta$ of bounded influence (that can deal with heavy-tailed densities).
The influence function is used in outlier detection to assess the impact of individual points, $x$.
It helps in evaluating the robustness of the models.
It can be applied in regression models, to assess the impact of influential data points (e.g. leverage in linear regression)
Describe robust estimation in regression
When doing regression it is very sensitive to outliers. Therefore using a robust estimation, that designs parameters less influenced by outliers it could be beneficial for regression models.
The idea of robust regression is toweigh the observations differently based on how well behaved these observations are.