Probability and statistics Flashcards
Unbiased Estimstor
If E(Tn) = θ
Tn is an unbiased estimator
Bias
b(Tn, θ) = E(Tn) - θ
The bias of Tn as an estimator of θ
Sampling Distribution
The distribution of the estimator
Sampling Error
The standard deviation of the sampling distribution
The sd is the sampling sd = S
Consistent Estimator
If Tn is unbiased for θ, Var(Tn) → 0 as n → ∞
∀∈>0 P( |Tn - θ|
Mean Square Error
MSE(T) = Var(T) + bias squared
MSE(T) = E[(T-θ)²]
Proof: MSE = Var(T) + b²(T,θ)
E[(T-θ)²] = E[(T-E(T) +E(T)-θ)²]
= E[(T-E(T))² + 2(T-E(T)) (E(T)-θ)+(E(T-θ)²]
=Var(T) + 2(E(T)-θ) E(T-E(T)) + (E(T) - θ)² = Var(T) + b²(T,θ)
Coefficient of Variation
S.d. / E(x)
Null Hypothesis
An assumption about a parameter which we wish to test in the basis of available data - H₀
Alternative Hypothesis
If the data are not deemed to support H₀ then we will conclude than an alternative hypothesis H₁ is supported
P-Value
Observed significance level
The observed significance level of a test (p-Value)
The probability of obtaining a value of the test statistic at least as extreme as that observed under H₀
Type 1 Error
If H₀ is true and we reject it
Type 2 Error
If H₀ is false and we accept it
Power
The quantity 1-β is called the power of a statistic test
It measures the test’s ability to detect a departure from H₀ when it exists
Critical value
The value below which we accept H₀ and above which we reject it
Komogorov’s Axioms of Probabilty 1-3
A probability function Pi is a mapping P:F→ℝ s.t.
- ∀ E∈ F, P(E) ≥ 0
- P(Ω) = 1
- If E ∩F = ∅ then P(E)∪ P(F) = P(E) + P(F)
Deductions from the axioms 1-4
- P(E°) = 1 - P(E)
- P(E) ≤ 1
- If E ⊆ F, the P(F|E) = P(F) - P(E)
- For any events E and F, (not necessarily disjoint); P(E∪F) = P(E) + P(F) - P(E∩F)
Independence
Events E and F are independent if P(E∩F) = P(E)P(F)
E and F are unrelated - E doesn’t affect F
Mutually Independent
Pairwise independence
Events E1…En are mutually independent of for any collection of the events, the independence relation holds
All pairs of events Ei and Ej are independent
Conditional Probability
The conditional probability of F given E is the probability of F occurring when E is known to have occurred
Sample space changes from Ω to E
For independent events - P(F) given E is just P(F)
Law of Total probability
Let {Ei} be a partition of Ω st. Each outcome belongs in exactly one of the partitions
Then P(F) = ∑P(F|Ei)P(Ei)
Proof:
F = F∩Ω = F∩(∪iEi) = ∪i(F∩Ei) = P(F)
=∑P(F∩Ei) = ∑P(F|Ei)P(Ei)
Random variable
A function from a sample space Ω to ℝ
Discrete: a function from a countable sample Ω to ℝ: X:Ω → ℝ is a d.r.v.
Continuous: X is a c.r.v. If Fx is continuous and differentiable
Probability Mass Function - drv
If x is a d.r.v. Taking values in the set {xi}, then the function Px(x) = P(X=x) is the pmf of x
Properties:
1. P(Xi) ≥ 0 for all i since these are probabilities of events (axiom 1)
- ∑P(Xi) = 1
Axiom 3
Probability Density Function - CRV
He pdf is fx - the derivative of the distribution function Fx
Properties:
1. f(x) ≥ 0
- The integral of f(x) over R is 1
Cumulative distribution Function
For any random variable X, the function Fx(X) = P(X≤x) = ∑ Px(xi)
Drv: F(X) is a step function with discontinuities at the Xi
CRV: Fx is continuous and differentiable
Properties:
- F(x) → 1 as → x
- F(x) → 0 as → x
- F(x) is monotonic increasing x1
Expectation Function
Properties:
E(x) exists if:
- sample space is finite
- the sum / integral converges absolutely
- Idealised long run average
Discrete:
Sum ∑xi P(Xi)
Continuous: Integral xf(x) over R
E(x) - sum / integral of xi times pmf / pdf
Properties of E(x)
- X = constant - P(X=c) = 1, then E(X) = C
- Y = aX +b
E(Y) = aE(X) + b
Proof - summation / integral and compute
Symmetry of E(x):
X has a symmetric pmf/pdf - if E(x) exists it is the central point of the pmf/pdf
If symmetric about μ, Let Y = X-μ - pmf is not symmetric about 0 so E(Y) = 0, E(X) - μ = 0 rearranging gives result
Properties of variance
- Var(x) ≥ 0
Sum of positive terms - all squared and real
- Var(x) = 0 if X is constant is. p(x=c) =1
Compute
- y = aX + b, var(y) = a²var(x)
Compute
Coefficient of Variation
δ/μ
The ratio of s.d / mean
Bernoulli
An experiment with 2 outcomes: success and failure
X - only takes values 0 or 1
Binomial
Sum of n independent Bernoulli trials
X - no. Of successes in n independent Bernoulli trials with probability of success p
Geometric
X - no. of independent Bernoulli trials until a success
The waiting time between successes in binomial
Negative Binomial
X - no. of Bernoulli trials until rth success
The sum of r independent geometric
Hyper geometric
X - no. of type 1 objects when n objects are drawn from population N containing M type 1 objects
Sampling without replacement - with replacement is the binomial!!
Poison Process
X - no. Of accidents in a fixed time period - rate of events in time period
Poisson process involves assuming independence between non-overlapping time intervals - Poisson is only appropriate when this independence is satisfied
Limit of binomial with n large and p small
Exponential
The time between events in a Poisson process
Gamma
X - time until kth accident in Poisson process
A sum of r independent exponential variables
Beta
Sequence of binomial distribution variables
Joint density for independent variables
If X1 and X2 are independent, the joint density function must factorise into a product of the form f1(X1)f2(x2)
Joint Distribution Function
Suppose X1…Xn are r.v. Defined on the same sample space.
The joint distribution function of X1..Xn is the function
F(x1, … xn) = P(X≤x₁…X≤xn)
Marginal Distribution Function
A distribution of a single random variable
F₁(x) = P(X₁≤x) = P(X₁≤x,X₂≤∞…..xn≤∞)
Iterated Expectation Law
E(X1) = Ex2[Ex1|x2 (x1 | x2) ]
Where Ex[•] denotes the expectation over the marginal distribution of X and Ey|x[•] denotes the expectation over the conditional distribution of Y given the value taken by X
Central Limit Theorem
If X1, X2 … Are independent random variables having a common distribution with means μ variance δ
Estimator
A statistic Tn = Tn (X1…Xn) is an estimator of a parameter θ if its value tn = Tn(X1.. Xn) is used as an estimator of θ
X1… Xn IID R.V. With unknown mean E(x) = μ
An estimator of μ is the sample mean
rth moment of X about α
E((x-α)^r)
Variance is the 2nd moment of x about the mean
Covariance
Cor(x₁,x₂) = E((x₁-μ₁)(x₂-μ₂))
If continuous: integrate over one variable over R and then the other variable over R
For discrete: sum over the different variables
Cor(x₁,x₂)
Independence → Cor(x₁,x₂) = 0
Cor(x₁,x₂) = 0 does not imply Independence
Sum of the expectations is the expectation of the sum
∑ E(Xi) = E( ∑ Xi ) = E(X1 + X2 + … Xn)
= ∑x1 ∑x2 … ∑xn (x1, x2, … xn) P(x1, x2….xn)
= ∑ X1 P(x1, x2….xn)…. ∑Xn P(x1, x2….xn)
Variance of the sum = sum of the variance mutually independent variables
Summation and compute
Continue joint distribution
X1, X2 are independent if joint pdf factorises
ρ - in bivariate normal
Correlation coefficient -1
Correlation parameter - measures the strength of linear association between the two variables x1, x2
ρ = Corr(x,y) = cov(x,y)/σx σy
Features of conditional joint distributions
Conditional mean: if E(x1|x2) is a linear function of x2 - suggests x1 and x2 are linearly dependent
If x2 > x1: conditional mean E(x1|x2) > marginal mean E(x1)
If x2 is greater than average - expect x1 to be greater than average
Conditional joint variance of x1:
σ₁²(1-ρ²)
Since -1<p></p>
Observations of conditional joint distributions
Conditional > marginal
Observed x1 exceeds its expectation and corr(x1,x2)
X2 is likely to exceed its expectation
Expectation of product is he product of the expectation
Use PGF to show!!
Sums of independent variables - PGF
If x1… Xn are independent discrete random variables taking non-negative integer values, the pgf of their sum is the product of their PGFs
If all the Xs are IID : pgf to power of n
Pgf only defined for discrete!!!!
Mgf - sums of independent variables
If x1… xn are independent random variables - the mgf of their sum is the product of their mgfs
Sample Mean
X bar is the sample mean
The average of all the observations in a sample
X bar n = Sn/n - it’s distribution is the sampling distribution of the mean
Strong law of large numbers
P(lim(x bar= μ) = 1 as n tends to infinity
Almost every possible sequence of sample means tends strictly to μ as n→∞
Chi-Squared Distribution
If z1…zn are independent N(0,1) random variables, the distribution of the sum of squares: ∑ Zi ² is the chi-squared distribution with n degrees of freedom
- same as the Γ(n/2,1/2) distribution with expectation n variance 2n
T-distribution
Z~N(0,1) and U~Xn²
Z and U are independent
Distribution of the ratio: T = z/√U/n is the t-distribution is n degrees of freedom
F-distribution
If U and V are independent r.vs distributed as Xm² and Xn² respectively
Distribution of the ratio: W = U/m / V/n
Is the F-distribution with m,n degrees of freedom
W ~ Fm,n => 1/W ~ Fn,m
Regression line
Y = b0 + b1x
Data points satisfy the n equations: yi = b0 + b1xi + ei
- ei = prediction error
- choose b0, b1 to minimise ∑ ei²
Explanatory variable
Xi values
Independent values
Response variables
yi are the observed (dependent) values of a random variable Yi, whose distribution depends on xi
Regression Curve
The curve as a function of x is the regression curve of y on x
Linear statistical model
One in which the regression curve is a linear function of the parameters of the model
{ei}
Independent random variables with μ = 0 and σ² = common variance
Residual sum of squares (RSS)
S(B0, B1) = ∑ (yi - B0 - B1xi) ²
Least squares method
A way to estimate B0, B1 to minimise the sum of square errors
Data must be:
Homoscedastic - variation in y is same for all x - variance is constant
Independent
Least squares error method
To minimise S(B0,B1)
Differentiate wrt. b0, B1 and set to zero
Least squares estimate of μ
The least squares estimate of μ minimises the squares errors S(μ) = ∑(yi - μ)²
Paired test
For data not independent!!!
Hypothesis testing
1 sample: Known variance: Z-test Unknown variance: T-test - sample variance s^2
2 sample:
Known variance
Z - test (standard normal)
Unknown variance
T - test with pooled sample variance
Testing for variance:
F - test
2-paired:
Testing for mean
T - test reduced to 1 sample problem and taking the difference of the results as the new tested variable data
Assumptions for paired sample t-test
Asse that the differences are independent, identically distributed and normally distributed
Transformation formula
Any monotonic increasing or decreasing function!!!
fy(y) = fx(x) |dx/dy| =fx(g^-1(y)) |dx/dy|
Switched the derivative!
Insert the change of variables
Don’t forget the modulus
Monotonic!!
Pgf of the sum is the product of the PGFs
If Sn a functions of Xi’s that are mutually independent, Any function of individual X’s are also mutually independent.
Thus the expectation of the product is the product of the expectations so the pgf is the sum is the product of the PGFs
Y has a negative binomial distribution with parameters r and p
Explain how it arises in the context of sequences of Bernoulli trials and explain how y can be regarded as a sum of independent geometrically distributed random variables
The NB(r,p) distribution arises at the distribution of the number is trials required to obtain r successes in a sequences of independent Bernouilli trials, each with success probability p
The geo(p) dist. Is the distribution of the number of trials required to obtain one success Because the trials are independent, the distributions of the number of trials between successes are therefore independent geometric r.v.s and the total number of trials until the rth success is therefore the sum of r independent geometric rvs
Suppose X1 and X2 are discrete random variables with means μ1 and μ2
What is meant by saying X1 and X2 are independent
If X1 and X2 are aE independent, then for all pairs of values (x1, x2) P(X1 = x1, X2=x2) =P(X1=x1)(P(X2=x2)
A sum of Poisson random variables is another Poisson
Sn~Poi(nμ)
Choosing between estimators
The main things to consider when choosing between estimators are bias and variance
Is the standard error (s.d) tends to zero as n tends to infinity - the estimator is consistent
Can also compare estimator on the basis of MSE = variance + bias squared
The estimator with the smaller mean squared error would be preferred as this is likely to yield an estimate that is closer to the true value of the parameter
Assumptions
Observations are normally distributed
There is no particular priori reason for this to be the case - in terms of situations in which the normal distribution is know to arise
The measurements have obviously been rounded to a discrete set of values - suggests that the assumption of normality probability is not realistic here
Probability Generating Function
The pgf is defined for discrete random variables taking non-negative integer values