Statistical Inference Flashcards
Sample mean
x¯ = 1/n ∑(i=1, n) xᵢ
Sample variance
s² = 1/(n-1) ∑(i=1 ,n) (xᵢ - x¯)²
Expected value
The expected value of a random variable X is a measure of the centre or average value that one would expect to occur if the random experiment or process were repeated many times. E(X)= ∑ xi ⋅P(X=xi)
Expected value of the sample mean
E[X¯] = µ
Sampling distribution
A probability distribution for a sample statistic
Variance
Var[x] = E[(x - µ)²] or Var[x] = E[x²] - E[x]²
Probability density function
Probability density function assigns a probability to intervals of values compared to the probability mass function which assigns probabilities to discrete values
PDF is usually denoted f(x)
Cumulative density function
CDF is denoted F(x) and is the antiderivative of the PDF f(x)
F(x) = P(X ≤ x)
PDF of sample maximum
for, Z = max(X₁ + X₂ + … + Xₙ)
g(z) = nf(z) (F(z))^n-1
PDF of sample minimum
for, W = min(X₁ + X₂ + … + Xₙ)
h(w) = nf(w) (1 - F(w))^n-1
CDF of sample maximum
G(z) = (F(z))ⁿ or the integral of g(z)
CDF of sample minimum
H(w) = 1 - (1 - F(W)ⁿ or the integral of h(w)
Normal distribution
Let X be a normally distributed random variable with mean µ and variance 𝜎².
then we have X ∼ N[ µ, 𝜎²]
Standardized normal random variable
Let X be a normally distributed random variable with mean µ and variance 𝜎².
Then Z = (X - µ)/ 𝜎
and we have mean = 0 and variance = 1
What is zᵧ, Used to determine the value of x in P(X < x)
we denote zᵧ to be the scalar such that P[ Z < zᵧ] = γ. where Z is a random variable that has a standard normal distribution, Z ∼ N [0, 1].
when should you use the student’s t-distribution?
- When the sample is small ie. n ≤ 30
or when the sample standard deviation is given instead of the population standard deviation.
Student’s t-distribution
If X₁, X₂, …, Xₙ is a random sample from a normal distribution with mean µ and variance 𝜎² then the random variable T = (X¯ - µ)/(S/√n) has the t-distribution with (n-1) degrees of freedom
T∼ t (n-1)
what is tᵧ(n-1),
We denote tᵧ (n-1) to be the scalar such that P[ T < tᵧ(n-1)] = γ where T ∼ t(n-1)
For example,
n = 11, P[ T < 1.812] = 0.95 t0.95((11)-1) = 1.812
Chi-squared distribution
The chi-squared distribution is the distribution of the random variable Z²₁ + Z²₂ + . . . + Z²ₙ
If Z is a random variable with chi-squared distribution we can write Z ∼ X²(ν) where ν is the degree of freedom
and X² = (n-1)S²/(𝜎²)
µ = v and 𝜎² = 2v
How can we tell its a chi-squared distribution?
When the random variable X follows the sum of the squares of independent standard normal random variables.
X = Z²₁ + Z²₂ + . . . + Z²ₙ
How can we tell its an F-distribution?
The f-distribution is characterised by 2 different sets of degrees of freedom v₁ and v₂.
Use f-distribution when we have two independent random variables X and Y such that X∼ X²(ν₁) and Y ∼ X²(ν₂)
we have F = (X/v₁)/(Y/v₂) of f-distribution F ∼ F(ν₁, ν₂)
What is the z score of a poppulation
Z = (X -X¯)/(√ (𝜎²/n))
Central limit theorem
Let X be an (iid) random variable with mean µ and variance 𝜎²
then we have
(a) ∑(i=1, n) Xᵢ ∼ N [nµ , n𝜎²], approximately
(b) X¯ ∼ N[µ , 𝜎²/n], approximately
what is an estimator
an estimator denoted θ̂ is a statistic used to estimate θ.
What is an unbiased estimator
an estimator θ̂ of θ is said to be unbiased if E[ θ̂ ] = θ. otherwise it is biased with bias[ θ̂ ] = E[ θ̂ ] - θ.
What is the mean square error of an estimator.
The mean square error (MSE) of an estimator θ̂ of θ is MSE[θ̂] = E[ (θ̂ - θ)²]
or
MSE[θ̂ ] = Var[ θ̂ ] + {bias[ θ̂ ]}²
Better estimator
Let θ̂₁ and θ̂₂ be two estimators of a parameter θ,
θ̂₁ is said to be a better estimator in MSE than θ̂₂ if MSE[ θ̂₁ ] < MSE [ θ̂₂ ].
Determine the z score of a confidence interval
For a 100(1 - 𝛼 ) % confidence level we take z(1-(𝛼/2))
ie. for a 95% confidence level 𝛼 = 0.05.
z(1-(0.05/2)) = Z0.975
then from the z distribution tables find the area of the normal distribution graph to the left of our z𝛼 value.
so z(1-(0.05/2)) = 1.645.
Confidence intervals for means, with normally distributed population and known variance
P[¯X - Z(1-(𝛼/2)) (𝜎/√ n) < µ < ¯X + Z(1-(𝛼/2)) (𝜎/√ n)] = 1 - 𝛼
where 𝛼 is the amount of risk
or
C = ¯X ± Z(1-(𝛼/2)) (𝜎/√ n)
Confidence intervals for means, with normally distributed population and unknown variance
P[¯X - t((𝛼/2),n-1) (s/√ n) < µ < ¯X + t((𝛼/2),n-1) (s/√ n)] = 1 - 𝛼
(where s = sample standard deviation.
or
C = ¯X ± t((𝛼/2),n-1) (s/√ n)
Poisson distribution
Probability mass function for possion distribution is given by P( X = x ) = (𝜆^(x) e^(-𝜆))/x!
The mean and variance for the poisson distribution is 𝜆.
E[ X ] = 𝜆 Var[ X ] = 𝜆
Confidence intervals for variance
P[S² - S²/(X²(1-(𝛼/2)) < 𝜎² < S² + S²/(X²(𝛼/2)] = 1 - 𝛼