Prob & Statistics Flashcards

1
Q

Define Binomial distr

A

Let X1,…,Xn be i.i.d. Ber(p) for some fixed p in (0,1).

Define Sn:=X1+…+Xn, then Sn is called a Binomial random variable with parameters n and p. We write Sn~Bin(n,p).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Geometric distr

A

Let (Xn)n be i.i.d. Ber(p) with p in (0,1). Define X as the first integer i>=1 for which Xi=1. Then X is called a geometric random variable with success probability p. We write X~Geo(p) or X~G(p).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Negative Binomial distr

A

Let (Xn)n be i.i.d. Bernoulli random variables with success probability p in (0,1). Define X as the random variable which is equal to the first integer i for which Sum(Xj : j in {1,…,i})=r. We write X~NB(r,p). Note X>=r.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Hypergeometric distr

A

Consider a population with N distinct individuals and composed exactly of D individuals of type I and N-D individuals of type II. Draw from this population n individuals at random and without replacement (an individual cannot be selected more than once). Define X= number of individuals of type I among the n selected ones.
Then, X is called a Hypergeometric random variable with parameters n, D and N. We write X ~ Hype(n,D,N).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Formulate Bayes Thm

A

Let A1,…,Am be a partition of some sample space s.t. P(Ai) in (0,1).
Then for j in {1,…,m} and event B s.t. P(B)>0 it holds that:
P(Aj | B)= P(B | Aj)P(Aj) / (Sum( P(B|Ai) P(Ai) | i in {1,…,m} )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If X~ f , what is aX~ ?

For f=U([0,1]),Beta(a,b),Exp(lam),Gamma(a,b), N(0,1)

A

X~Exp(lam) => lam*X~Exp(1)

X~N(0,1) => sig*X~N(0,sig^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If X1,…,Xn~ f, what is Sum(X)~ ?

For f=U([0,1]),Beta(a,b),Exp(lam),Gamma(a,b), N(mü,sig^2), N(0,1)

A

X~G(ai,b) => X1+…+Xn~G(Sum(ai),b)
X~N(0,1) => X1+…+Xn~N(0,n)
X~N(mü_i,sig_i^2)=>Sum(Xi)~N(Sum(mu_i),Sum(sig_i^2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. If X~N(mü,sig^2) what is Z s.t. Z~N(0,1)?
  2. If Z~N(0,1) what is X s.t. X~N(mü,sig^2)?
  3. If X~N(mü,sig^2) then F_X(x) = …. ? F_X(mü)=..?
A
  1. Z=(X-mü)/sig
  2. X=mü+sig*Z
  3. Phi((x-mü)/sig); 1/2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define quantile

A

Given a cdf F, the quantile ta of order a in (0,1) is defined as ta:=inf{t | F(t)>=a}=:F^-1(a), where F^-1 denotes the generalized inverse of F. When the latter is bijective (at least in the ngbh of ta), then F^-1 is the inverse of F in the classical sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define converges in distribution

A

Let (Zn) be a sequence of random variables (not necessarily defined on the same prob space). We say that (Zn) converges in distr towards Z~N(0,1) if F_Zn(x)–>F(x)=Phi(x) for all x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

State the Central Limit Theorem

A

Let X1,..,Xn be i.i.d. r.v. with expectation mü in R and variance sig in (0,inf). Then Zn:=sqrt(n)(bar(Xn)-mü)/sig [or equivalently (Sn-nmü)/(sqrt(n)sig)] converges in distribution towards Z~N(0,1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Var(Sum(sig*Xi))= ?

A

Var(Sum(sigXi))=Sum(vivj*cov(Xi,Xj))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Law of iterated expectation?

A

E[X]=E[E[X|Y]] (whenever E[X] < inf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

State the Jacobian formula

A

Let X ~ f and g in C^1(O) for some open O of R and g strictly monotone and g’(x)!=0 forx in O and P(X in O)=1. Then r.v. Y=g(X) is absolutely contin with density a.e. equal to f_Y(y)=(f_X o g^-1(y))/|g’ o g^-1(y)|*1_g(O)(y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define estimator

A

Let X1,…,Xn be i.i.d. f( . | theta0) random variables for theta_0 in Theta s.s.o. R^d, d>=1.
hat(theta) is an estimator for theta_0, based on X1,…,Xn i, if hat(theta) is a statistic of X1,…,Xn, that is any quantity of the form T(X1,…,Xn), where T is a measurable map on (R^n,B(R^n)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define moment estimator

A

Let X1,…,Xn i.i.d. f( . | theta_0) with theta_0=(theta_01,…,theta_0d)^T in Theta s.s.o. R^d, d>=1. Also suppose that if X~f( . |theta) the moments E[X],…,E[X^d] exist and
theta_1=Psi_1(E[X],…,E[X^d])

theta_d=Psi_d(E[X],…,E[X^d])
with Psi_1,…,Psi_d some measurable functions. The moment estimator, hat(theta), of theta_0 is obtained by replacing E[X^j] by its empirical estimator: 1/n*Sum(Xi^j : 1=< i =< n) for j in {1,…,d}, i.e.
hat(theta)_1=Psi_1(Sum(Xi)/n,…,Sum(Xi^d)/n)

hat(theta)_d=Psi_d(Sum(Xi)/n,…,Sum(Xi^d)/n)

17
Q

Define the likelihood function.

A

Let X1,…,Xn i.i.d. f( . | theta_0) with some unknown theta_0 in Theta, the likelihood function defined by Theta is given by L(theta)=Prod(f(xi|theta)) for theta in Theta.

18
Q

Define MLE

A

The maximum likelihood estimator (MLE) is deined by hat(theta)=argmax L(theta). Provided it exists and is unique, and is a measurable map of X1,…,Xn.

19
Q

Define the MSE and bias

A

The mean square error (MSE) of hat(theta) is defined as MSE(hat(theta_n)):=E[(hat(theta_n)) - theta0)^2] (provided that E[hat(theta)_n^2]
bias(hat(theta_n)):=E[hat(theta_n)]:=E[hat(theta_0)]-theta_0

20
Q

Define efficiency

A

We say that hat(theta) is more efficient than theta^~ if MSE(hat(theta))=< MSE(theta^~).
We define the efficiency of hat(theta) relative to theta^~ as
eff(hat(theta),theta^~):=MSE(theta^~)/MSE(theta^).

21
Q

Define sufficiency

A

A statistic T(X1,…,Xn) is said to be sufficient for theta if the conditional distribution of (x1,…,xn)^T given T(X1,…,Xn)=t does not depend on theta, whenever X1,…,Xn~iid f( . |theta_0).

22
Q

State the factorization theorem

Corollary: cT?

A

A statistic T(X1,…,Xn) is sufficient if and only if there exist non-negative functions g and h s.t. L(theta)=Prod(f(Xi|theta))=g(T(X1,…,Xn),theta)h(X1,..,Xn)
Corollary: If T suff then cT suff for all c in R
.

23
Q

State the Rao-Blackwell theorem

A

Let hat(theta) be an estimator and T=T(X1,…,Xn) a sufficient statistic.
theta_n^~=E[hat(theta_n)|T]=E[hat(theta_n)|T(X1,…,Xn].
MSE(theta_n^~) =< MSE(hat(theta_n))
with equality iff hat(theta_n)=theta_n^~ with probability 1.

24
Q

Define decision function

A

A decision function d is any measurable function defined on (U,B) s.t. 0=< d(x) =< 1 for all x in U.
Where X:(Omega,A,P)–>(U,B)

25
Q

Define test function

A

A test function d (for testing some H0 versus H1) is any decision function s.t. for all x in U:
We reject H0 with probability d(x)
We accept H0 with probability 1-d(x)

26
Q

Define Type I, Type II errors and power

A

Let d be a test function for testing
H0: theta in Theta0, versus
H1: theta in Theta1
Let X~f( . | theta) be the random variable/vector on which the decision will be based.
(i) If theta in Theta0, then E_theta[d(X)] is called the Type I error for theta
(ii) If theta in Theta1, then E_theta[d(X)] is called power of the test for theta
(iii) If theta in Theta1, then E_theta[1-d(X)])=1-beta(theta) is called Type II error for theta

27
Q

How should we construct a test function d?

A

Ideally we want to find a test function d s.t.
C1) sup { E_theta[d(X)] | theta in Theta0 } =< alpha for some fixed alpha in (0,1))
C2) E_theta[d(X)]>=E_theta[d(X)] for all theta in Theta1 and any other test function satisfying C1, that is
sup { E_theta[d
(X)] | theta in Theta } =< alpha.
d is then said to be uniformly most powerful test of level alpha, UMP of level alpha, for short.

28
Q

State the Neyman-Pearson Lemma

A

Let P0 and P1 be two probability measures on (U,B) with densities f0 and f1 respectively w.r.t. some sig-finite dominating measure mü on (U,B). Consider the testing problem:
H0: f=f0, versus H1:f=f1
where f is the (unknown) density of X, a random variable/vector taking its values in (U,B).
The Neyman-Pearson test of level alpha is given by:
d_NP(x):= 1 if f1(x)>k_alphaf0(x),
d_NP(x):= gamma_alpha if f1(x) = k_alpha
f0(x)
d_NP(x):=0 if f1(x)< k_alpha*f0(x)
where k_alpha in (0,inf) and gamma_alpha in [0,1] satisfy:
E_0[d_NP(x)]:=E_f0[d_NP(x)]=alpha.
Moreover, d_NP is a UMP test of level alpha (it has the largest power among all tests of level alpha).

29
Q

a) Define the Cramér-Rao lower bound

b) What is it a lower bound of?

A

a) 1/(nI(theta_0)), where I(theta_0) is the Fischer information (under some regularity conditions).
b) Let X1,…,Xn be i.i.d. f( . |theta_0) for some theta_0 in Theta. If hat(theta_n)=T(X1,…,Xn) with T some measurable map is an unbiased estimator of theta_0, then Var(hat(theta_n))>= 1/(n
I(theta_0)), where I(theta_0) is the Fischer information (under some regularity conditions).

30
Q

Give the theorem on consistency of an esimator

A

Let X1,…,Xn be i.i.d. random variables ~ f( . | theta_0), theta_0 in Theta. Let hat(theta) (=hat(theta_n)) be the MLE (maximum likelihood estimator) based on X1,…,Xn. Under some regularity conditions (on f(x|theta) in x and theta) the MLE hat(theta_n) is consistent, that is
For all eps>0: lim P(|hat(theta_n)-theta_0| > eps)=0
(iff hat(theta)–P-> theta_0 as n –> inf, convergence in probability holds for any theta_0 in Theta)

31
Q

Explicitly give the result on asymptotic normality for an estimator

A

Let X1,…,Xn be i.i.d. f( . | theta_0), theta_0 in Theta.
Let hat(theta) (=hat(theta_n)) be the MLE based on X1,…,Xn. Under additional regularity conditions, it holds that:
sqrt(n)(hat(theta_n)-theta_0)–d->N(0,1/I(theta_0))
or equivalently sqrt(n
I(theta_0))(hat(theta_n)-theta_0)=sqrt(n/(1-theta_0))(hat(theta_n)-theta_0)/theta_0–d->N(0,1)

32
Q

Define conditional variance

State the iterated variance formula

A
If X,Y continuous then 
Var(X|Y)=h(Y),
where
h(y):=Int( (x-E(X|Y=y))^2*f(x|y) ).
Var(X)=E[Var(X|Y)]+Var(E[X|Y]), whenever Var(X)
33
Q

State the de Moivre-Laplace theorem

A

Let Xn~Bin(n,p) with p in (0,1).
Then (X_n-np)/sqrt(np(1-p))–d->N(0,1)
iff P((X_n-n
p)/sqrt(np(1-p))=Phi(x) as n–>inf for all x in T.

34
Q

If X1,…,Xn~ Distr, what is Sum(Xi)~ ?

For Distr = U([0,1]), Ber(p), Bin(n,p), Geo(p), NB(r,p), Hypergeo(n,D,N), Poi(lam)

A

Xi~Bin(ni,p) then Sum(Xi)~Bin(Sum(ni),p)
Xi~NB(ri,p) then Sum(Xi)~NB(Sum(ri),p)
Xi~Poi(lam_i) then Sum(Xi)~Poi(Sum(lam_i))

35
Q

State the Weak Law of Large numbers (WLLN)

A
Let X1,...,Xn be i.i.d. radom variables with E(X1)=mü and Var(X1)=sigma^2 (both finite).
Then for every eps>0:
lim P( |bar(Xn)-mü| > eps)=0
36
Q

Find confidence interval of level (1-alpha) for theta,
if chat(theta) - dtheta_0~Zn~N(0,1)
e.g. 1-alpha=0.95

A

(Let z_alpha:=Phi^-1(alpha))
Let a:=1-alpha/2 e.g. ~ 0.975
P(-z_a =< Zn =< z_a) ~ 1-alpha
P(-z_a =< chat(theta) - dtheta_0 =< z_a) ~ 1-alpha
P( (chat(theta)-z_a)/d =< theta_0 =< (chat(theta)+z_a)/d) ~ 1-alpha
P(theta_0 in I)~1-alpha
where
I=[(chat(theta)-z_a)/d,(chat(theta)+z_a)/d]