Prob & Statistics Flashcards
Define Binomial distr
Let X1,…,Xn be i.i.d. Ber(p) for some fixed p in (0,1).
Define Sn:=X1+…+Xn, then Sn is called a Binomial random variable with parameters n and p. We write Sn~Bin(n,p).
Define Geometric distr
Let (Xn)n be i.i.d. Ber(p) with p in (0,1). Define X as the first integer i>=1 for which Xi=1. Then X is called a geometric random variable with success probability p. We write X~Geo(p) or X~G(p).
Define Negative Binomial distr
Let (Xn)n be i.i.d. Bernoulli random variables with success probability p in (0,1). Define X as the random variable which is equal to the first integer i for which Sum(Xj : j in {1,…,i})=r. We write X~NB(r,p). Note X>=r.
Define Hypergeometric distr
Consider a population with N distinct individuals and composed exactly of D individuals of type I and N-D individuals of type II. Draw from this population n individuals at random and without replacement (an individual cannot be selected more than once). Define X= number of individuals of type I among the n selected ones.
Then, X is called a Hypergeometric random variable with parameters n, D and N. We write X ~ Hype(n,D,N).
Formulate Bayes Thm
Let A1,…,Am be a partition of some sample space s.t. P(Ai) in (0,1).
Then for j in {1,…,m} and event B s.t. P(B)>0 it holds that:
P(Aj | B)= P(B | Aj)P(Aj) / (Sum( P(B|Ai) P(Ai) | i in {1,…,m} )
If X~ f , what is aX~ ?
For f=U([0,1]),Beta(a,b),Exp(lam),Gamma(a,b), N(0,1)
X~Exp(lam) => lam*X~Exp(1)
X~N(0,1) => sig*X~N(0,sig^2)
If X1,…,Xn~ f, what is Sum(X)~ ?
For f=U([0,1]),Beta(a,b),Exp(lam),Gamma(a,b), N(mü,sig^2), N(0,1)
X~G(ai,b) => X1+…+Xn~G(Sum(ai),b)
X~N(0,1) => X1+…+Xn~N(0,n)
X~N(mü_i,sig_i^2)=>Sum(Xi)~N(Sum(mu_i),Sum(sig_i^2))
- If X~N(mü,sig^2) what is Z s.t. Z~N(0,1)?
- If Z~N(0,1) what is X s.t. X~N(mü,sig^2)?
- If X~N(mü,sig^2) then F_X(x) = …. ? F_X(mü)=..?
- Z=(X-mü)/sig
- X=mü+sig*Z
- Phi((x-mü)/sig); 1/2
Define quantile
Given a cdf F, the quantile ta of order a in (0,1) is defined as ta:=inf{t | F(t)>=a}=:F^-1(a), where F^-1 denotes the generalized inverse of F. When the latter is bijective (at least in the ngbh of ta), then F^-1 is the inverse of F in the classical sense.
Define converges in distribution
Let (Zn) be a sequence of random variables (not necessarily defined on the same prob space). We say that (Zn) converges in distr towards Z~N(0,1) if F_Zn(x)–>F(x)=Phi(x) for all x.
State the Central Limit Theorem
Let X1,..,Xn be i.i.d. r.v. with expectation mü in R and variance sig in (0,inf). Then Zn:=sqrt(n)(bar(Xn)-mü)/sig [or equivalently (Sn-nmü)/(sqrt(n)sig)] converges in distribution towards Z~N(0,1).
What is Var(Sum(sig*Xi))= ?
Var(Sum(sigXi))=Sum(vivj*cov(Xi,Xj))
Law of iterated expectation?
E[X]=E[E[X|Y]] (whenever E[X] < inf)
State the Jacobian formula
Let X ~ f and g in C^1(O) for some open O of R and g strictly monotone and g’(x)!=0 forx in O and P(X in O)=1. Then r.v. Y=g(X) is absolutely contin with density a.e. equal to f_Y(y)=(f_X o g^-1(y))/|g’ o g^-1(y)|*1_g(O)(y).
Define estimator
Let X1,…,Xn be i.i.d. f( . | theta0) random variables for theta_0 in Theta s.s.o. R^d, d>=1.
hat(theta) is an estimator for theta_0, based on X1,…,Xn i, if hat(theta) is a statistic of X1,…,Xn, that is any quantity of the form T(X1,…,Xn), where T is a measurable map on (R^n,B(R^n)).
Define moment estimator
Let X1,…,Xn i.i.d. f( . | theta_0) with theta_0=(theta_01,…,theta_0d)^T in Theta s.s.o. R^d, d>=1. Also suppose that if X~f( . |theta) the moments E[X],…,E[X^d] exist and
theta_1=Psi_1(E[X],…,E[X^d])
…
theta_d=Psi_d(E[X],…,E[X^d])
with Psi_1,…,Psi_d some measurable functions. The moment estimator, hat(theta), of theta_0 is obtained by replacing E[X^j] by its empirical estimator: 1/n*Sum(Xi^j : 1=< i =< n) for j in {1,…,d}, i.e.
hat(theta)_1=Psi_1(Sum(Xi)/n,…,Sum(Xi^d)/n)
…
hat(theta)_d=Psi_d(Sum(Xi)/n,…,Sum(Xi^d)/n)
Define the likelihood function.
Let X1,…,Xn i.i.d. f( . | theta_0) with some unknown theta_0 in Theta, the likelihood function defined by Theta is given by L(theta)=Prod(f(xi|theta)) for theta in Theta.
Define MLE
The maximum likelihood estimator (MLE) is deined by hat(theta)=argmax L(theta). Provided it exists and is unique, and is a measurable map of X1,…,Xn.
Define the MSE and bias
The mean square error (MSE) of hat(theta) is defined as MSE(hat(theta_n)):=E[(hat(theta_n)) - theta0)^2] (provided that E[hat(theta)_n^2]
bias(hat(theta_n)):=E[hat(theta_n)]:=E[hat(theta_0)]-theta_0
Define efficiency
We say that hat(theta) is more efficient than theta^~ if MSE(hat(theta))=< MSE(theta^~).
We define the efficiency of hat(theta) relative to theta^~ as
eff(hat(theta),theta^~):=MSE(theta^~)/MSE(theta^).
Define sufficiency
A statistic T(X1,…,Xn) is said to be sufficient for theta if the conditional distribution of (x1,…,xn)^T given T(X1,…,Xn)=t does not depend on theta, whenever X1,…,Xn~iid f( . |theta_0).
State the factorization theorem
Corollary: cT?
A statistic T(X1,…,Xn) is sufficient if and only if there exist non-negative functions g and h s.t. L(theta)=Prod(f(Xi|theta))=g(T(X1,…,Xn),theta)h(X1,..,Xn)
Corollary: If T suff then cT suff for all c in R.
State the Rao-Blackwell theorem
Let hat(theta) be an estimator and T=T(X1,…,Xn) a sufficient statistic.
theta_n^~=E[hat(theta_n)|T]=E[hat(theta_n)|T(X1,…,Xn].
MSE(theta_n^~) =< MSE(hat(theta_n))
with equality iff hat(theta_n)=theta_n^~ with probability 1.
Define decision function
A decision function d is any measurable function defined on (U,B) s.t. 0=< d(x) =< 1 for all x in U.
Where X:(Omega,A,P)–>(U,B)
Define test function
A test function d (for testing some H0 versus H1) is any decision function s.t. for all x in U:
We reject H0 with probability d(x)
We accept H0 with probability 1-d(x)
Define Type I, Type II errors and power
Let d be a test function for testing
H0: theta in Theta0, versus
H1: theta in Theta1
Let X~f( . | theta) be the random variable/vector on which the decision will be based.
(i) If theta in Theta0, then E_theta[d(X)] is called the Type I error for theta
(ii) If theta in Theta1, then E_theta[d(X)] is called power of the test for theta
(iii) If theta in Theta1, then E_theta[1-d(X)])=1-beta(theta) is called Type II error for theta
How should we construct a test function d?
Ideally we want to find a test function d s.t.
C1) sup { E_theta[d(X)] | theta in Theta0 } =< alpha for some fixed alpha in (0,1))
C2) E_theta[d(X)]>=E_theta[d(X)] for all theta in Theta1 and any other test function satisfying C1, that is
sup { E_theta[d(X)] | theta in Theta } =< alpha.
d is then said to be uniformly most powerful test of level alpha, UMP of level alpha, for short.
State the Neyman-Pearson Lemma
Let P0 and P1 be two probability measures on (U,B) with densities f0 and f1 respectively w.r.t. some sig-finite dominating measure mü on (U,B). Consider the testing problem:
H0: f=f0, versus H1:f=f1
where f is the (unknown) density of X, a random variable/vector taking its values in (U,B).
The Neyman-Pearson test of level alpha is given by:
d_NP(x):= 1 if f1(x)>k_alphaf0(x),
d_NP(x):= gamma_alpha if f1(x) = k_alphaf0(x)
d_NP(x):=0 if f1(x)< k_alpha*f0(x)
where k_alpha in (0,inf) and gamma_alpha in [0,1] satisfy:
E_0[d_NP(x)]:=E_f0[d_NP(x)]=alpha.
Moreover, d_NP is a UMP test of level alpha (it has the largest power among all tests of level alpha).
a) Define the Cramér-Rao lower bound
b) What is it a lower bound of?
a) 1/(nI(theta_0)), where I(theta_0) is the Fischer information (under some regularity conditions).
b) Let X1,…,Xn be i.i.d. f( . |theta_0) for some theta_0 in Theta. If hat(theta_n)=T(X1,…,Xn) with T some measurable map is an unbiased estimator of theta_0, then Var(hat(theta_n))>= 1/(nI(theta_0)), where I(theta_0) is the Fischer information (under some regularity conditions).
Give the theorem on consistency of an esimator
Let X1,…,Xn be i.i.d. random variables ~ f( . | theta_0), theta_0 in Theta. Let hat(theta) (=hat(theta_n)) be the MLE (maximum likelihood estimator) based on X1,…,Xn. Under some regularity conditions (on f(x|theta) in x and theta) the MLE hat(theta_n) is consistent, that is
For all eps>0: lim P(|hat(theta_n)-theta_0| > eps)=0
(iff hat(theta)–P-> theta_0 as n –> inf, convergence in probability holds for any theta_0 in Theta)
Explicitly give the result on asymptotic normality for an estimator
Let X1,…,Xn be i.i.d. f( . | theta_0), theta_0 in Theta.
Let hat(theta) (=hat(theta_n)) be the MLE based on X1,…,Xn. Under additional regularity conditions, it holds that:
sqrt(n)(hat(theta_n)-theta_0)–d->N(0,1/I(theta_0))
or equivalently sqrt(nI(theta_0))(hat(theta_n)-theta_0)=sqrt(n/(1-theta_0))(hat(theta_n)-theta_0)/theta_0–d->N(0,1)
Define conditional variance
State the iterated variance formula
If X,Y continuous then Var(X|Y)=h(Y), where h(y):=Int( (x-E(X|Y=y))^2*f(x|y) ). Var(X)=E[Var(X|Y)]+Var(E[X|Y]), whenever Var(X)
State the de Moivre-Laplace theorem
Let Xn~Bin(n,p) with p in (0,1).
Then (X_n-np)/sqrt(np(1-p))–d->N(0,1)
iff P((X_n-np)/sqrt(np(1-p))=Phi(x) as n–>inf for all x in T.
If X1,…,Xn~ Distr, what is Sum(Xi)~ ?
For Distr = U([0,1]), Ber(p), Bin(n,p), Geo(p), NB(r,p), Hypergeo(n,D,N), Poi(lam)
Xi~Bin(ni,p) then Sum(Xi)~Bin(Sum(ni),p)
Xi~NB(ri,p) then Sum(Xi)~NB(Sum(ri),p)
Xi~Poi(lam_i) then Sum(Xi)~Poi(Sum(lam_i))
State the Weak Law of Large numbers (WLLN)
Let X1,...,Xn be i.i.d. radom variables with E(X1)=mü and Var(X1)=sigma^2 (both finite). Then for every eps>0: lim P( |bar(Xn)-mü| > eps)=0
Find confidence interval of level (1-alpha) for theta,
if chat(theta) - dtheta_0~Zn~N(0,1)
e.g. 1-alpha=0.95
(Let z_alpha:=Phi^-1(alpha))
Let a:=1-alpha/2 e.g. ~ 0.975
P(-z_a =< Zn =< z_a) ~ 1-alpha
P(-z_a =< chat(theta) - dtheta_0 =< z_a) ~ 1-alpha
P( (chat(theta)-z_a)/d =< theta_0 =< (chat(theta)+z_a)/d) ~ 1-alpha
P(theta_0 in I)~1-alpha
where
I=[(chat(theta)-z_a)/d,(chat(theta)+z_a)/d]