MTH2006 STATISTIC MODELLING AND INFERENCE Flashcards

1
Q

cumulative distribution function (cdf) of a random variable Y

A

F_Y(y) = Pr(Y < y) where y belongs to the range space of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

probability mass function (pmf) [if Y is discrete]

A

f_Y(Y) = Pr(Y = y) and F_Y(y) = x:x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability density function (pdf) [if Y is continuous]

A

f_Y(y) = d/dy F_Y(y) and F_Y(y) = integral(y -> −∞)(f_Y(x)) dx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

p-quantile of a random variable Y

A

the value y_p for which Pr(y ≤ y_p) = p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pr(Y > y)

A

1 − Pr(Y ≤ y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

joint cumulative distribution function (cdf) of a vector Y1,…Yn

A

F_Y(y1, . . . , yn) = Pr(Y1 ≤ y1, . . . , Yn ≤ yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If Y1, . . . , Yn are discrete then their joint pmf is defined by

A

f_Y(y1, . . . , yn) = Pr(Y1 = y1, . . . , Yn = yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If Y1, . . . , Yn are continuous then their joint pdf is defined by

A

fY (y1, . . . , yn) = ∂^n/∂y_1 . . . ∂y_n F_Y (y1, . . . , yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Y1, . . . , Yn are independent if

A

f_Y (y1, . . . , yn) = f_Y1

(y1). . . fY_n(yn) for all y1,…,yn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Y1, . . . , Yn are identically distributed if

A

f_Y1(y) = . . . = f_Yn(y) for all y1,…yn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if Y1, . . . , Yn are independent and identically distributed (iid) then their joint pdf or pmf is

A

f_Y(y1, . . . , yn) = f_Y1(y1). . . f_Y1(yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explanatory variable

A

plotted on the x-axis and is the variable manipulated by the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

response variable

A

plotted on the y-axis and depends on the other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if Y has a poisson distribution with parameter µ, then we write Y~Poi(µ) and Y has pmf

A

f_Y(y) = µ^ye^-µ / y! for y = 0,1,2…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has cdf

A

F_Y(y; θ) = 1 - e^-θy for y > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has pdf

A

f_y(y; θ) = d/dy F_Y(y; θ) = θe^-θy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

for p-quantile cdf is

A

F_Y(y_p) = p and y_p = F_Y^-1(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

expectation is

A

E(g(Y)) = sum(Pr(Y=x)g(x) = sum(F(x)g(X) where F(x) is the pmf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

variance of random variable Y is

A

Var(Y) = E(Y - E(Y)^2) = E(Y^2) - E(Y)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

empirical probability r/n is

A

r/n = Pr(X ≤ x_r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

simple linear model means

A

one explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

an example of a joint distribution for two variables is the…

A

bivariate normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

if X and Y are independent

A

f(x,y) = f_x(x)f_y(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

covariance formula:

A

Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E(XY) = E(X)E(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

if independent, covariance formula:

A

Cov(X, Y) = 0

E(XY) = E(X)E(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

covariance with correlation/variance formula:

A

Cov(p*sqrt[Var(X)Var(Y)])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

an example of a joint distribution for two variables, each with a normal distribution is called

A

the bivariate normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

the joint pdf of the bivariate normal distribution

A

f(x, y; θ) = 1 / 2πσXσY sqrt(1 − ρ^2) * exp(−1/2(1 − ρ^2)[(x − µX)^2/σX^2 + (y − µY)^2/σY^2 − 2ρ(x − µX)(y − µY )/σXσY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

a continuous random variable Y defined on

(−∞,∞) with pdf f(y; θ) has expectation denoted by E(Y) and defined as..

A

E(Y ) = integral(∞ -> −∞) y * f(y;θ) dy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

a discrete random variable with range space R and pmf f(y; θ), E(Y) is defined as…

A

E(Y ) = sum(y∈R) [y * f(y; θ)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

for a real valued function g(Y), when continuous, E[g(Y)] is

A

integral(∞ -> −∞) g(y) * f(y; θ)dy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

for a real valued function g(Y), when discrete, E[g(Y)] is

A

sum(y ∈ R) g(y) * f(y; θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

α-confidence interval is …

A

an interval

estimator that contains the true parameter value θ with probability α for every θ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

null hypothesis

A

H_0 : θ = x (this is also a simple hypothesis = completely species a probability model by specifying a specific value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

alternative hypothesis

A

H_1 : θ NOT= x

this is also a composite hypothesis = it does not completely specify a probability model

36
Q

if H_1 : θ NOT= x; which specifies values either side of H_0 it is called

A

a two-sided alternative

37
Q

if H_1 : θ < x it is called a

A

one-sided alternative

38
Q

for a null hypothesis H_0 : θ = θ_0, the null distribution is the distribution of T(Y) when …

A

θ = θ_0

39
Q

let f_Y(y; θ) be continuous and denote the joint pdf of Y then Pr(Y ∈ C)

A

integral(C) f_Y (y; θ), if f(y; θ) is continuous

40
Q

let f_Y(y; θ) be discrete and denote the joint pmf of Y then Pr(Y ∈ C)

A

sum(y ∈ C) f_Y (y; θ), if f(y; θ) is discrete

41
Q

the size of the test α

A

α = Pr(Y ∈ C; θ_0)

42
Q

probability of a type I error

A

is to reject H_0 when it is true

43
Q

probability of a type II error

A

is to reject H_0 when it is false

44
Q

if the alternative hypothesis is simple, then the power of the test is

A

Pr(Y ∈ C; θ_1) = the probability of not making a type II error/detecting that H_0 is false

45
Q

for a set y1, …, yn, the sample moments are

A

m^r = 1/n sum(i = 1 -> n) y^r_i

46
Q

for a continuous or discrete random variable Y, the moment generating function (mgf) of Y is

A

M_Y(t) = E(e^(tY))

47
Q

for the kth moment of Y it is

A

E(Y^k) = m_k

48
Q

the central moments of Y are

A

E[{Y − E(Y)}^r]

49
Q

the method of moment estimate , ˆθ, is such

that …

A

m_r(ˆθ) = ˆmr for r = 1, . . . , d

50
Q

for sample variance we will use

A

s^2 = (n − 1)^(−1) * sum(i=1(yi − yBAR)^2)

51
Q

the distribution of the different estimates is called the

A

sampling distribution

52
Q

the standard deviation of the estimated sampling distribution gives the

A

estimated standard error

53
Q

for the data yBAR = (y1,…,yn) we have an estimate

A

ˆθ(y)

54
Q

for the data yBAR = (y1,…,yn) we have an estimator

A

ˆθ(Y)

55
Q

for critical region C = {y : T(y) < c}, the p-value is

A

p = Pr[T(Y ) ≤ t; θ0]

56
Q

for critical region C = {y : T(y) > c}, the p-value is

A

p = Pr[T(Y ) ≥ t; θ0]

57
Q

the null distribution of the t-statistic is defined as

A

T(Y) = YBAR − µ0 / s/√n (which is called the t-distribution with n-1 degrees of freedom)

58
Q

the t-statistic has the cdf which is denoted as

A

Φ_n−1(y)

59
Q

the sample variance of Φ_n−1(y) is

A

s^2 = 1 / n − 1 sum(n -> i=1) (Yi − YBAR)^2 is an unbiased estimator of σ^2

60
Q

critical region

A

when {C : T > tc} where tc is the critical value

61
Q

power of the test is when the alternative hypothesis is simple and is the probability of …

A

not making a type II error or the probability of detecting that H_0 is false

62
Q

the p-value is the probability that …

A

the observed test statistic is no better than the value we observed with respect to H_0 when H_0 is true

63
Q

sum(n -> i = m) (cx_i) =

A

c ^(n−m+1) sum(n -> i = m) x_i

.

64
Q

sum(n -> i = m) x^c_i =

A

[sum(n -> i = m) x_i ]^c

65
Q

sum(n -> i = m) c^(x_i) =

A

c^(sum(n -> i = m) x_i

66
Q

a sample, y = (y1, . . . , yn), modelled as a realisation of independent random variables, Y = (Y1, . . . , Yn). For i = 1, . . . , n, let f_Yi
(y; θ) denote the pmf or pdf of Yi, where θ is the model parameter. The joint pmf
or pdf of Y evaluated at our sample y is then
fY (y; θ) =

A

= fY_1(y1; θ). . . fY_n(yn; θ) = sum(n -> i = 1) f_Yi(y_i; θ) by independence

67
Q

the joint pmf or pdf as a function of θ it is referred

to as the likelihood function and is denoted …

A

L(θ; y) = f_Y (y; θ)

68
Q

the parameter value that maximizes the likelihood is called the …

A

maximum likelihood estimate (mle)

69
Q

it is usually simpler to maximize the logarithm of the likelihood instead - the log-likelihood is denoted …

A

l(θ; y) = log L(θ; y) = l(θ)

70
Q

a constant will not affect the shape of the likelihood - true or false..

A

true (and therefore will affect the shape of the maximum - we can ignore multipliers when we calculate the mle)

71
Q

mean squared error

A

Let ˆθ be an estimator for θ. The mean squared error of
ˆθ is mse(ˆθ) = E{(ˆθ −θ)^2} and the bias of ˆθ is Bias(ˆθ) = E(ˆθ) − θ.
If the bias is zero then the estimator is unbiased.

72
Q

mean squared error can be written in terms of bias and variance:

A

mse(ˆθ) = Var(ˆθ) + Bias(ˆθ)^2

73
Q

a consistent estimator

A

The estimator ˆθ is consistent for θ if, for all e > 0, lim(n→∞)Pr(|ˆθ − θ| > e) = 0. [this is the asymptotic limit)

74
Q

Only one parameter in the model, the mle is …

A

scalar and its approximate sampling distribution will be a univariate normal distribution

75
Q

More than one parameter in the model, the mle is …

A

a vector and its sampling distribution will be a multivariate normal distribution

76
Q

expectation of vector of random variables with ith element Yi

A

Let Y be a vector of random variables with ith element Yi. The expectation of Y is the vector with ith element E(Yi).

77
Q

variance of vector of random variables with ith element Yi

A

The variance of Y is the matrix with (i, j)th element Cov(Yi, Yj ). This can be written as
Var(Y ) = E[{Y − E(Y )}{Y − E(Y )}^T]
= E(YY^T)) − E(Y)E(Y)^T.

78
Q

observed information J(θ)

A
Let l(θ) be a log-likelihood function. The observed information is
J(θ) = −∂^2 l(θ) / ∂θ ∂θ^T.
79
Q

If θ is a scalar then, J(θ) is

A

-d^2 l(θ) / dθ^2

80
Q

If θ is a vector with ith element θi then, J(θ) is a matrix with (i, j)th element,

A

-∂^2 l(θ) / ∂θi∂θj

81
Q

expected information is I(θ) = E{J(θ)}, that is the matrix with (i, j)th element,

A

E{−∂^2 l(θ) / ∂θi∂θj}

82
Q

multivariate normal distribution

A

The random variable Y = (Y1, . . . , Yd) has a multivariate normal distribution with expectation E(Y) = µ and variance Var(Y) = Σ
if the pdf of Y is
f_Y (y; µ, Σ) = (2π)^(−d/2)det(Σ)^(−1/2)exp[−(1/2)(y − µ)^(T)Σ^(−1)(y − µ)],
in which case we write Y∼N(µ,Σ).

83
Q

useful properties of multivariate normal distribution:

A
if two random variables are independent then they are uncorrelated because independence implies that  E(Y1Y2) = E(Y1)E(Y2) and
so Cov(Y1, Y2) = E(Y1Y2) − E(Y1)E(Y2) = 0.
also linear transformations of multivariate normal random variables are also multivariate normal
84
Q

THEOREM: When n is large, the sampling distribution of the mle is approximately N(θ, I(θ)^(−1)). This is called the asymptotic distribution of the mle.

A

If n is large then the square roots of the diagonal elements of I(θ)^−1
approximate the standard errors of the mles in ˆθ. These standard errors can
be estimated by replacing θ with ˆθ in I(θ)^−1.

85
Q

likelihood ratio test statistic

A

T = 2 { l(ˆθ; y) − l(θ_0; y) }