MTH2006 STATISTIC MODELLING AND INFERENCE Flashcards
cumulative distribution function (cdf) of a random variable Y
F_Y(y) = Pr(Y < y) where y belongs to the range space of Y
probability mass function (pmf) [if Y is discrete]
f_Y(Y) = Pr(Y = y) and F_Y(y) = x:x
probability density function (pdf) [if Y is continuous]
f_Y(y) = d/dy F_Y(y) and F_Y(y) = integral(y -> −∞)(f_Y(x)) dx
p-quantile of a random variable Y
the value y_p for which Pr(y ≤ y_p) = p
Pr(Y > y)
1 − Pr(Y ≤ y)
joint cumulative distribution function (cdf) of a vector Y1,…Yn
F_Y(y1, . . . , yn) = Pr(Y1 ≤ y1, . . . , Yn ≤ yn)
If Y1, . . . , Yn are discrete then their joint pmf is defined by
f_Y(y1, . . . , yn) = Pr(Y1 = y1, . . . , Yn = yn)
If Y1, . . . , Yn are continuous then their joint pdf is defined by
fY (y1, . . . , yn) = ∂^n/∂y_1 . . . ∂y_n F_Y (y1, . . . , yn)
Y1, . . . , Yn are independent if
f_Y (y1, . . . , yn) = f_Y1
(y1). . . fY_n(yn) for all y1,…,yn
Y1, . . . , Yn are identically distributed if
f_Y1(y) = . . . = f_Yn(y) for all y1,…yn
if Y1, . . . , Yn are independent and identically distributed (iid) then their joint pdf or pmf is
f_Y(y1, . . . , yn) = f_Y1(y1). . . f_Y1(yn)
explanatory variable
plotted on the x-axis and is the variable manipulated by the researcher
response variable
plotted on the y-axis and depends on the other variables
if Y has a poisson distribution with parameter µ, then we write Y~Poi(µ) and Y has pmf
f_Y(y) = µ^ye^-µ / y! for y = 0,1,2…
if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has cdf
F_Y(y; θ) = 1 - e^-θy for y > 0
if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has pdf
f_y(y; θ) = d/dy F_Y(y; θ) = θe^-θy
for p-quantile cdf is
F_Y(y_p) = p and y_p = F_Y^-1(p)
expectation is
E(g(Y)) = sum(Pr(Y=x)g(x) = sum(F(x)g(X) where F(x) is the pmf
variance of random variable Y is
Var(Y) = E(Y - E(Y)^2) = E(Y^2) - E(Y)^2
empirical probability r/n is
r/n = Pr(X ≤ x_r)
simple linear model means
one explanatory variable
an example of a joint distribution for two variables is the…
bivariate normal distribution
if X and Y are independent
f(x,y) = f_x(x)f_y(y)
covariance formula:
Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E(XY) = E(X)E(Y)
if independent, covariance formula:
Cov(X, Y) = 0
E(XY) = E(X)E(Y)
covariance with correlation/variance formula:
Cov(p*sqrt[Var(X)Var(Y)])
an example of a joint distribution for two variables, each with a normal distribution is called
the bivariate normal distribution
the joint pdf of the bivariate normal distribution
f(x, y; θ) = 1 / 2πσXσY sqrt(1 − ρ^2) * exp(−1/2(1 − ρ^2)[(x − µX)^2/σX^2 + (y − µY)^2/σY^2 − 2ρ(x − µX)(y − µY )/σXσY
a continuous random variable Y defined on
(−∞,∞) with pdf f(y; θ) has expectation denoted by E(Y) and defined as..
E(Y ) = integral(∞ -> −∞) y * f(y;θ) dy
a discrete random variable with range space R and pmf f(y; θ), E(Y) is defined as…
E(Y ) = sum(y∈R) [y * f(y; θ)]
for a real valued function g(Y), when continuous, E[g(Y)] is
integral(∞ -> −∞) g(y) * f(y; θ)dy
for a real valued function g(Y), when discrete, E[g(Y)] is
sum(y ∈ R) g(y) * f(y; θ)
α-confidence interval is …
an interval
estimator that contains the true parameter value θ with probability α for every θ
null hypothesis
H_0 : θ = x (this is also a simple hypothesis = completely species a probability model by specifying a specific value)
alternative hypothesis
H_1 : θ NOT= x
this is also a composite hypothesis = it does not completely specify a probability model
if H_1 : θ NOT= x; which specifies values either side of H_0 it is called
a two-sided alternative
if H_1 : θ < x it is called a
one-sided alternative
for a null hypothesis H_0 : θ = θ_0, the null distribution is the distribution of T(Y) when …
θ = θ_0
let f_Y(y; θ) be continuous and denote the joint pdf of Y then Pr(Y ∈ C)
integral(C) f_Y (y; θ), if f(y; θ) is continuous
let f_Y(y; θ) be discrete and denote the joint pmf of Y then Pr(Y ∈ C)
sum(y ∈ C) f_Y (y; θ), if f(y; θ) is discrete
the size of the test α
α = Pr(Y ∈ C; θ_0)
probability of a type I error
is to reject H_0 when it is true
probability of a type II error
is to reject H_0 when it is false
if the alternative hypothesis is simple, then the power of the test is
Pr(Y ∈ C; θ_1) = the probability of not making a type II error/detecting that H_0 is false
for a set y1, …, yn, the sample moments are
m^r = 1/n sum(i = 1 -> n) y^r_i
for a continuous or discrete random variable Y, the moment generating function (mgf) of Y is
M_Y(t) = E(e^(tY))
for the kth moment of Y it is
E(Y^k) = m_k
the central moments of Y are
E[{Y − E(Y)}^r]
the method of moment estimate , ˆθ, is such
that …
m_r(ˆθ) = ˆmr for r = 1, . . . , d
for sample variance we will use
s^2 = (n − 1)^(−1) * sum(i=1(yi − yBAR)^2)
the distribution of the different estimates is called the
sampling distribution
the standard deviation of the estimated sampling distribution gives the
estimated standard error
for the data yBAR = (y1,…,yn) we have an estimate
ˆθ(y)
for the data yBAR = (y1,…,yn) we have an estimator
ˆθ(Y)
for critical region C = {y : T(y) < c}, the p-value is
p = Pr[T(Y ) ≤ t; θ0]
for critical region C = {y : T(y) > c}, the p-value is
p = Pr[T(Y ) ≥ t; θ0]
the null distribution of the t-statistic is defined as
T(Y) = YBAR − µ0 / s/√n (which is called the t-distribution with n-1 degrees of freedom)
the t-statistic has the cdf which is denoted as
Φ_n−1(y)
the sample variance of Φ_n−1(y) is
s^2 = 1 / n − 1 sum(n -> i=1) (Yi − YBAR)^2 is an unbiased estimator of σ^2
critical region
when {C : T > tc} where tc is the critical value
power of the test is when the alternative hypothesis is simple and is the probability of …
not making a type II error or the probability of detecting that H_0 is false
the p-value is the probability that …
the observed test statistic is no better than the value we observed with respect to H_0 when H_0 is true
sum(n -> i = m) (cx_i) =
c ^(n−m+1) sum(n -> i = m) x_i
.
sum(n -> i = m) x^c_i =
[sum(n -> i = m) x_i ]^c
sum(n -> i = m) c^(x_i) =
c^(sum(n -> i = m) x_i
a sample, y = (y1, . . . , yn), modelled as a realisation of independent random variables, Y = (Y1, . . . , Yn). For i = 1, . . . , n, let f_Yi
(y; θ) denote the pmf or pdf of Yi, where θ is the model parameter. The joint pmf
or pdf of Y evaluated at our sample y is then
fY (y; θ) =
= fY_1(y1; θ). . . fY_n(yn; θ) = sum(n -> i = 1) f_Yi(y_i; θ) by independence
the joint pmf or pdf as a function of θ it is referred
to as the likelihood function and is denoted …
L(θ; y) = f_Y (y; θ)
the parameter value that maximizes the likelihood is called the …
maximum likelihood estimate (mle)
it is usually simpler to maximize the logarithm of the likelihood instead - the log-likelihood is denoted …
l(θ; y) = log L(θ; y) = l(θ)
a constant will not affect the shape of the likelihood - true or false..
true (and therefore will affect the shape of the maximum - we can ignore multipliers when we calculate the mle)
mean squared error
Let ˆθ be an estimator for θ. The mean squared error of
ˆθ is mse(ˆθ) = E{(ˆθ −θ)^2} and the bias of ˆθ is Bias(ˆθ) = E(ˆθ) − θ.
If the bias is zero then the estimator is unbiased.
mean squared error can be written in terms of bias and variance:
mse(ˆθ) = Var(ˆθ) + Bias(ˆθ)^2
a consistent estimator
The estimator ˆθ is consistent for θ if, for all e > 0, lim(n→∞)Pr(|ˆθ − θ| > e) = 0. [this is the asymptotic limit)
Only one parameter in the model, the mle is …
scalar and its approximate sampling distribution will be a univariate normal distribution
More than one parameter in the model, the mle is …
a vector and its sampling distribution will be a multivariate normal distribution
expectation of vector of random variables with ith element Yi
Let Y be a vector of random variables with ith element Yi. The expectation of Y is the vector with ith element E(Yi).
variance of vector of random variables with ith element Yi
The variance of Y is the matrix with (i, j)th element Cov(Yi, Yj ). This can be written as
Var(Y ) = E[{Y − E(Y )}{Y − E(Y )}^T]
= E(YY^T)) − E(Y)E(Y)^T.
observed information J(θ)
Let l(θ) be a log-likelihood function. The observed information is J(θ) = −∂^2 l(θ) / ∂θ ∂θ^T.
If θ is a scalar then, J(θ) is
-d^2 l(θ) / dθ^2
If θ is a vector with ith element θi then, J(θ) is a matrix with (i, j)th element,
-∂^2 l(θ) / ∂θi∂θj
expected information is I(θ) = E{J(θ)}, that is the matrix with (i, j)th element,
E{−∂^2 l(θ) / ∂θi∂θj}
multivariate normal distribution
The random variable Y = (Y1, . . . , Yd) has a multivariate normal distribution with expectation E(Y) = µ and variance Var(Y) = Σ
if the pdf of Y is
f_Y (y; µ, Σ) = (2π)^(−d/2)det(Σ)^(−1/2)exp[−(1/2)(y − µ)^(T)Σ^(−1)(y − µ)],
in which case we write Y∼N(µ,Σ).
useful properties of multivariate normal distribution:
if two random variables are independent then they are uncorrelated because independence implies that E(Y1Y2) = E(Y1)E(Y2) and so Cov(Y1, Y2) = E(Y1Y2) − E(Y1)E(Y2) = 0. also linear transformations of multivariate normal random variables are also multivariate normal
THEOREM: When n is large, the sampling distribution of the mle is approximately N(θ, I(θ)^(−1)). This is called the asymptotic distribution of the mle.
If n is large then the square roots of the diagonal elements of I(θ)^−1
approximate the standard errors of the mles in ˆθ. These standard errors can
be estimated by replacing θ with ˆθ in I(θ)^−1.
likelihood ratio test statistic
T = 2 { l(ˆθ; y) − l(θ_0; y) }