CHAPTER 3: Useful ideas and methods for inference Flashcards
Theorem 10 (CLT for iid variables).
If random variables X1, . . . , Xn are independent and
identically distributed with mean µ and variance σ^2 < ∞, then
[{Σ from i=1 to n of [X_i] } − nµ]/σ√n
→ Z ∼ N (0, 1), as n →∞
Theorem 11 (CLT for iid random vectors).
If random vectors bold(X1), . . . , bold(Xn) are independent
and identically distributed with mean vector bold(µ) and variance-covariance matrix bold(Σ), finite,
then
[{Σ from i=1 to n of [X_i] } − nµ]/√n
→ Z ∼ N (0, Σ), as n → ∞
vectors in the normal dist
CLT notes:
- The → in these theorems denotes convergence in distribution.
- converges in distribution to a Normal distribution it is often said to be asymptotically Normal as n → ∞.
- The Central Limit true for dependent and/or non-identically distributed random variables/vectors under suitable conditions.
Likelihood and inference
conclusions about unknown parameter θ, one- or multi-dimensional on basis of x and model f_X
sample observations x1, . . . , xn = bold(x) are modelled as the values of random variables X1, . . . , Xn = bold(X)
probability density function (probability function in the
discrete case) f_X of X depends on an unknown parameter θ,
Definition 12:
LIKELIHOOD
The likelihood of θ based on observed data x is defined to be the function
of θ:
L(θ) = L(θ; x) = f_X(x; θ).
*In the discrete case, for each θ, L(θ) gives the probability of observing the data x if θ is
the true parameter (provided f is from the correct family of distributions).
- L(θ) as a measure of how plausible θ is as the value that generated the observed
data x - in continuous case measurements are made only to a bounded precision, prob density funct is proportional to the probability of finding the RV in a small interval
Ratio of likelihoods
The ratio L(θ_1)/L(θ_2) measures how plausible θ_1 is relative to θ_2 as the value generating
the data
maximum likelihood
If θˆ is the most plausible value; that is, the value of θ for which
L(θˆ) = max_θ [L(θ)]
maximum likelihood estimate
Relative Likelihood
all values of θ for which the Relative Likelihood
RL(θ) = L(θ)/L(θˆ)
is not too much different from 1 are plausible in the light of the observed x.
(L(θ_1)/L(θ_2) when θ_2 is the parameter maximizing likelihood )
log-likelihood
convenient to plot the likelihood on a log scale
log-likelihood is defined to be
l(θ) = log L(θ).
- independence - multiplications - log transforms into +
- Statements about relative likelihoods become statements about differences of log-likelihoods.
- exp dist easier
likelihood regions
Thus values of θ plausible in the light of the data (or consistent with the data) are those
contained in sets of the form
{θ : l(θ) > l(θˆ) − c}
for suitable constants c
*1-D case interval
- value θˆ is the maximum likelihood estimator (mle) of θ: the value within
the parameter space – the set of permissible values of the parameter – maximizing
L(θ). dependence on data x : θˆ(x).
*For inferences about θ, only relative values of the likelihood matter,
can neglect constants (factors not dep on θ) and use whatever version of L or
l is convenient.
*If we re-parametrize to φ = g(θ) where g is a continuous invertible function, then the likelihood L changes in the obvious way: if L1 denotes likelihood with respect to φ, then L1(φ) = L(g^−1(φ)). Also, most usefully, φˆ = g(θˆ). expect same likelihood estimator ~transformation
Indep and log
independent X_i. Then
L(θ) = ∏ from i=1 to n [f_{Xi}(xi; θ)
and
l(θ) = Σ from i=1 from n [log f_{Xi}(xi; θ)
where f_{Xi} denotes the density function of X
likelihood equation(s)
θˆ may be found as the solution of the likelihood equation(s)
∂L(θ)/∂θ= 0
or equivalently,
∂l(θ)/∂θ = 0
ie max of functs
EXAMPLE:
random sample obs x_1,.. x_n from exp dist with unknown mean θ ≥ 0. (For example, we could observe a Poisson process until
we have n occurrences, and let xi be the ith inter-occurrence time.)
The probability density function for each observation is
f_{Xi}(x; θ) =
{(1/θ)e^{−x/θ} x ≥ 0
{0 x < 0
so that
l(θ) =
{n (log θ - ¯x/θ) if min xi ≥ 0
{−∞ otherwise
Since
∂l/∂θ = n(−1/θ + ¯x/θ^2),
the maximum likelihood estimator is ˆθ = ¯x.
Recall that the usual parametrization of the exponential distribution uses the rate parameter λ = 1/θ so that replaced by f_{Xi}(x; θ) = {λ e^{−λx} x ≥ 0 {0 x < 0
If we write down the log likelihood for λ, we get
l(λ) = n(log λ − λx¯), and maximizing this
gives λˆ = 1/x¯ = 1/ˆθ as expected.
A likelihood interval
would be found in this case by finding the values of θ for which
l(ˆθ) − l(θ) =
n (¯x/θ − 1 − log(¯x/θ)) < c.
Evidently numerical or graphical solution would be needed.
example plot of l(θ) based on a sample of size n = 10 for which x¯ = 2.3. shows skewed hump at 2.3, likelihoods within 2 of max are good est para given small sample
Example 9. Markov chain
We consider a two state Markov chain (Xn), as in Example 4 but with state space S = {1, 2},
with transition matrix
[1 − θ θ ]
[ φ 1 − φ]
We assume that the chain is in equilibrium, and we consider finding the likelihood for the
parameters θ = (θ, φ).
The stationary distribution here is ( φ/(θ+φ) θ/(θ+φ) )
Imagine we observe X_0 = 2, X_1 = 1. Because we assume the chain is in equilibrium, we have
P(X_0 = 2) = θ/(θ+φ)
so
P(X_0 = 2, X_1 = 1) =
[θ/(θ + φ)]φ
Hence this expression also gives us the likelihood of (θ, φ) given our observation, and we can
write
L(θ, φ; x) = [θφ/(θ + φ)]
.
OR imagine d observe the sequence of states 2, 1, 1, 2, 2, 2. Then our likelihood
becomes
L(θ, φ; x) =
[θ/(θ + φ)]φ(1 − θ)θ(1 − φ)(1 −φ) = [θ^2φ(1 − θ)(1 − φ)^2]/(θ + φ)
plotting
* plotting θ against φ and showing varying values of likelihood
likelihood increases as θ increases and φ increases. Start in state 2 and to 1 prob is high, implies θ is high
graph similar to reciprocal graph for varying values
**contour plot, 6 states so more info 0.57≈ θ^0.26≈φ^
(FOUND BY stationary distribution probability *probabilities of successive states)
Approximating the log likelihood
Taylor series about max and at max first deriv disappears
It turns out that in many cases it can usefully be
approximated by a quadratic function of θ, so can be summarized by the position of the
maximum and the curvature there.