Chapter 6/7/8 Flashcards
The Central Limit Theorem
If X₁, X₂, …, Xₙ is a sequence of independent, identically distributed (iid) random variables with finite mean μ and finite (non-zero) variance σ², then the standardized sample mean
Z = (X̄ - μ) / (σ/√n)
approaches the standard normal distribution N(0, 1) as n → ∞, where
X̄ = (X₁ + X₂ + … + Xₙ) / n
This means that for large n, the sample mean is approximately normally distributed regardless of the original distribution of the Xᵢ.
Alternatively, unstandardised
X̄~N(μ,σ^2/n)
Sum(X)~N(nμ,nσ^2)
How large?
n>=30
But it depends on the shape of the population, that is, the distribution of X , and in particular how skewed it is.
If this population distribution is fairly symmetric even though non-normal, then n=10 may be large enough; whereas if the distribution is very skewed, n=50 may be necessary.
For Bin: When np & n(1-p)>5 (basically closer top 0.5 the better, for low p use Poisson)
For Poi (lambda>5)
If it’s already an iid (Gamma, Bin) then just use the mean and var if n is large enough.
Random Samples
A set of items selected from a parent population is a random sample if:
The probability that any item in the population is included in the sample is proportional to its frequency in the parent population.
The inclusion or exclusion of any item in the sample is independent of the inclusion or exclusion of any other item.
A random sample consists of independent, identically distributed (iid) random variables, which are denoted by capital X’s. We use the shorthand notation:
𝑋̲ = (X₁, X₂, …, Xₙ)
An observed sample is denoted by lowercase x’s:
x = (x₁, x₂, …, xₙ)
The population distribution is specified by a probability (density) function, denoted by f(x; θ), where θ represents the parameter(s) of the distribution.
Due to the Central Limit Theorem, inference concerning a population mean can be made based on the approximate normality of the sample mean for large sample sizes.
Statistic
- A statistic is a function of 𝑋̲ only and does not involve any unknown parameters.
For example X̄ and S² - A statistic is generally denoted by g(X), and because it is a function of random variables, it itself is a random variable with its own sampling distribution.
Standard Error
SD(X̄)=σ/√n
t distribution
- t_1 is called the Cauchy distribution, none of it’s moments exist
- t_k->Z as k-> infinity
- for k>2, mean=0, var=k/(k-2)
Estimate vs Estimator
Estimate: Refers to a specific numerical value obtained from applying a formula to a given sample.
Estimator: Refers to the random variable or function that represents the rule or formula for estimating a parameter based on the sample.
Invariance property
Applies to MLEs
if x^ is the MLE of x then g(x^) is the MLE of g(x)
Censored vs Truncated Data
- Censored data arise when we have information about the full range of possible values but that information is not complete (eg when we only know that there are, say, 6 values greater than 500).
- Truncated data arise when we have no information about part of the range of possible values (eg when we have no information at all about values greater than 500).
Bias
Bias(θ̂) = E[θ̂] − θ
This is a measure of the difference between the expected value of the estimator and the true parameter being estimated.
- If the bias is greater than zero, the estimator is said to be positively biased, meaning it tends to overestimate the true value.
- If the bias is less than zero, the estimator is said to be negatively biased, meaning it tends to underestimate the true value.
- If the bias is zero, the estimator is said to be unbiased.
The property of unbiasedness is not preserved under non-linear transformations of the estimator/parameter.
So, for example, the fact that S^2 is an unbiased estimator of the population variance does not mean that S is an unbiased estimator of the population standard deviation.
Bias is less important than MSE
MSE
MSE(𝑔(𝑋))= E((𝑔(𝑋)−𝜃)^2)
=Var + Bias^2
if MSE->0 as n->infinity it is consistent
Cramer-Rao Lower Bound
The Cramér-Rao Lower Bound states that given a random sample of size n from a distribution with likelihood function L(θ, X̲), the maximum likelihood estimator θ̂ is such that, for large n, θ̂ is approximately normal, and is unbiased with variance given by the Cramér-Rao lower bound, that is:
θ̂ ~ N(θ, CRLB)
where CRLB = 1 / I(θ)
and I(θ), the Fisher information, is defined as:
I(θ) = -E[d² / dθ² log L(θ, X̲)]
E[{d / dθ log L(θ, X̲)]²}
=nE[(d/dθ log f(X; θ))²]
MLE v MoM
- MLE is generally regarded as the superior method.
- For one-parameter cases, the MoM estimator is always a function of the sample mean X̄, which can limit its usefulness. For example, in the uniform distribution on (0,θ), the method of moments estimator is 2X, which can result in inadmissible estimates greater than θ.
- However, in many common cases, such as the binomial, Poisson, exponential, and normal distributions, both methods yield the same estimator.
- In some cases, such as the gamma distribution with two unknown parameters, the method of moments offers an advantage in simplicity, as maximum likelihood estimation may require complex numerical methods due to the presence of functions like Γ(α).
The pivotal method
The pivotal method requires a pivotal quantity of the form g(X̄, θ) with the following properties:
- It is a function of the sample values and the unknown parameter θ.
- Its distribution is completely known.
- It is monotonic (either never decreases or never increases) in θ.
The distribution in condition (2) must not depend on θ. ‘Monotonic’ means that the function either consistently increases or decreases with θ.
How large a sample is needed
This question cannot be answered without further information on:
1. the accuracy of estimation required
1. an indication of the size of the population standard deviation
The latter information may not readily be available, in which case a small pilot sample may be needed or a rough guess based on previous studies in similar populations