Sampling and Statistical Inference Flashcards

1
Q

The Goal of Statistical Inference

A

1.Learn a quantity of interest about a particular group (population parameters)
2.Often information on all members of the group (population) will not be available
3.We use sampling to collect a limited amount of information and use it to infer
population properties (parameters).
4.Since we have only information on a subset of the population we are uncertain
about our inference (but there are other sources of uncertainty as well even if we
observe the entire population)
5. All inferences are inherently uncertain

The goal of statistical inference is to estimate population parameters and summarize our
uncertainty about these estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of the parameter?

A

parameter describes a feature of the population. The parameter is fixed at some
value, and we will never be able to know it for sure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is random sample which we observe?

A

What we observe is a random sample, drawn from the population. A random sample
is a proper subset of the population for which it is true that each member has an
equal probability to be selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an estimator of the population parameter?

A

Sample statistic of a population parameter. A
sample statistic is a function that is applied on the observed sample. This function
is called the estimator of the population parameter.

We can calculate the mean of a random sample. The mean is then a sample statistic, and
the function that maps observations of the random sample to this sample statistic is the
estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Example of estimator, estimate and estimand

A

We rely on the observations in our sample and use a linear (regression) function (the estimator) to estimate the causal effect of education on income in our sample which is our estimate for the population-level causal effect (the estimand)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Principle of Sampling

A

Probability sampling: Select from a population with size N a number of individuals, n
(usually n ≪ N), such that each individual has a non-zero probability of being chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sources of variation across samples

A

Sampling variability: Means and standard deviations of repeated samples will not be
identical
Sampling error: An estimate from a sample will not be identical to the value in the
population
The sample size is positively related to the desired precision of the estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conducting Statistical Inferences

A

We have: (1) a population, (2) a sample from this population, and (3) an estimate of a
population parameter.
* How uncertain are we about that estimate?
* Alternatively, how precisely can we estimate the population parameter?
* In a different sample, our estimate would be slightly different. Hence, estimates vary
over repeated samples.
* Applying an estimator on repeated samples yields a sampling distribution for this
statistic.
* Calculating the spread of this sampling distribution yields a measure of uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example of Statistical Inference and formula of standard error

A
  • Let there be a country with 100,000 inhabitants.
  • We want to know what the mean income of this country is.
  • We sample 5000 individuals randomly from the population.
  • The mean of obtained sample is 1400 (= θ_hat) with a standard deviation of 2000 (= σ_hat).
  • The standard error of the estimate of the population mean (θ_hat) is:
    σpop/√n.
  • For a large sample, the standard deviation (σ_hat) of a sample can be used as an
    approximation of the population standard deviation (σpop):

SE(θ_hat) =σ_hat/√n=2000/√5000=28.3

The mean income of our population is estimated to be 1400 ± 28.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Digression: Derivation of Standard Erro

A
  • Let’s assume that we have a random sample from a population, i.e., we have n
    random variables, θ1, …, θn that come from the same population represented by a distribution with mean µ and variance σ^2
  • Such random variables are called independently and identically distributed (iid)
  • Hence, we know that Var(θi) = σ^2
    for all such random variables.
  • Denote their mean as an estimate of the population mean by θ_hat
  • Then, we get the sampling variance (i.e., variance of an estimator)

Var(θ_hat)=1/n*σ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is confidence interval of q%?

A

We call a confidence interval a q% confidence interval if it is constructed such that it
contains the true parameter at least q% of the time if we repeat the experiment a
a large number of times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Three Different Approaches to Assess Uncertainty

A
  1. Analytical
  2. Bootstrapping (resampling)
  3. Simulation (parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Analytical Approach: CIs via Normal Approximation, how?

A
  • We have a sample statistic θ_hat estimated for a parameter θ.
  • If the sample is large enough, we can assume a normal sampling distribution with
    mean θ_hat and variance Var(θ_hat)
  • We can then construct a 95% confidence interval using the quantiles from the
    standard normal distribution:
    θ_hat+/-1.96√Var(θ_hat)=θ_hat+/-1.96σ_hat/√n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is bootstraping?

A

Bootstrapping estimates the sampling distribution of θ by repeatedly sampling (with
replacement) from the original sample.

  1. Take s samples of size n from your data.
  2. Calculate the quantity of interest ((θ_hat)i, e.g. the mean) for each of your s samples, which
    yields a vector of length s.
  3. A simple confidence interval for your quantity can be obtained by calculating quantiles
    (e.g., 2.5 and 97.5 percentiles for 95% CI) of this vector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How sampling works?

A
  1. Create a (normal) sampling distribution from the mean and standard error of your
    sample.
  2. Take s draws from that distribution N(θ_hat, σ_hat^2).
  3. Calculate your quantity of interest s times. Thus, we simulated its sampling distribution.
  4. Calculate summaries, such as means and standard errors, for the resulting vector of
    length s. To construct 95% CIs, you need to 2.5 and 97.5 percentiles.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly