Probability and statistics Flashcards

1
Q

Unbiased Estimstor

A

If E(Tn) = θ

Tn is an unbiased estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bias

A

b(Tn, θ) = E(Tn) - θ

The bias of Tn as an estimator of θ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling Distribution

A

The distribution of the estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling Error

A

The standard deviation of the sampling distribution

The sd is the sampling sd = S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Consistent Estimator

A

If Tn is unbiased for θ, Var(Tn) → 0 as n → ∞

∀∈>0 P( |Tn - θ|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean Square Error

MSE(T) = Var(T) + bias squared

A

MSE(T) = E[(T-θ)²]

Proof: MSE = Var(T) + b²(T,θ)

E[(T-θ)²] = E[(T-E(T) +E(T)-θ)²]

= E[(T-E(T))² + 2(T-E(T)) (E(T)-θ)+(E(T-θ)²]

=Var(T) + 2(E(T)-θ) E(T-E(T)) + (E(T) - θ)² = Var(T) + b²(T,θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Coefficient of Variation

A

S.d. / E(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Null Hypothesis

A

An assumption about a parameter which we wish to test in the basis of available data - H₀

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Alternative Hypothesis

A

If the data are not deemed to support H₀ then we will conclude than an alternative hypothesis H₁ is supported

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

P-Value

Observed significance level

A

The observed significance level of a test (p-Value)

The probability of obtaining a value of the test statistic at least as extreme as that observed under H₀

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Type 1 Error

A

If H₀ is true and we reject it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Type 2 Error

A

If H₀ is false and we accept it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Power

A

The quantity 1-β is called the power of a statistic test

It measures the test’s ability to detect a departure from H₀ when it exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Critical value

A

The value below which we accept H₀ and above which we reject it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Komogorov’s Axioms of Probabilty 1-3

A

A probability function Pi is a mapping P:F→ℝ s.t.

  1. ∀ E∈ F, P(E) ≥ 0
  2. P(Ω) = 1
  3. If E ∩F = ∅ then P(E)∪ P(F) = P(E) + P(F)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Deductions from the axioms 1-4

A
  1. P(E°) = 1 - P(E)
  2. P(E) ≤ 1
  3. If E ⊆ F, the P(F|E) = P(F) - P(E)
  4. For any events E and F, (not necessarily disjoint); P(E∪F) = P(E) + P(F) - P(E∩F)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Independence

A

Events E and F are independent if P(E∩F) = P(E)P(F)

E and F are unrelated - E doesn’t affect F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Mutually Independent

Pairwise independence

A

Events E1…En are mutually independent of for any collection of the events, the independence relation holds

All pairs of events Ei and Ej are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Conditional Probability

A

The conditional probability of F given E is the probability of F occurring when E is known to have occurred

Sample space changes from Ω to E

For independent events - P(F) given E is just P(F)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Law of Total probability

A

Let {Ei} be a partition of Ω st. Each outcome belongs in exactly one of the partitions

Then P(F) = ∑P(F|Ei)P(Ei)

Proof:
F = F∩Ω = F∩(∪iEi) = ∪i(F∩Ei) = P(F)
=∑P(F∩Ei) = ∑P(F|Ei)P(Ei)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Random variable

A

A function from a sample space Ω to ℝ

Discrete: a function from a countable sample Ω to ℝ: X:Ω → ℝ is a d.r.v.

Continuous: X is a c.r.v. If Fx is continuous and differentiable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Probability Mass Function - drv

A

If x is a d.r.v. Taking values in the set {xi}, then the function Px(x) = P(X=x) is the pmf of x

Properties:
1. P(Xi) ≥ 0 for all i since these are probabilities of events (axiom 1)

  1. ∑P(Xi) = 1

Axiom 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Probability Density Function - CRV

A

He pdf is fx - the derivative of the distribution function Fx

Properties:
1. f(x) ≥ 0

  1. The integral of f(x) over R is 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Cumulative distribution Function

A

For any random variable X, the function Fx(X) = P(X≤x) = ∑ Px(xi)

Drv: F(X) is a step function with discontinuities at the Xi

CRV: Fx is continuous and differentiable

Properties:

  1. F(x) → 1 as → x
  2. F(x) → 0 as → x
  3. F(x) is monotonic increasing x1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Expectation Function

Properties:

E(x) exists if:

  • sample space is finite
  • the sum / integral converges absolutely
A
  • Idealised long run average

Discrete:
Sum ∑xi P(Xi)

Continuous: 
Integral xf(x) over R 

E(x) - sum / integral of xi times pmf / pdf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Properties of E(x)

A
  1. X = constant - P(X=c) = 1, then E(X) = C
  2. Y = aX +b
    E(Y) = aE(X) + b

Proof - summation / integral and compute

Symmetry of E(x):
X has a symmetric pmf/pdf - if E(x) exists it is the central point of the pmf/pdf
If symmetric about μ, Let Y = X-μ - pmf is not symmetric about 0 so E(Y) = 0, E(X) - μ = 0 rearranging gives result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Properties of variance

A
  1. Var(x) ≥ 0

Sum of positive terms - all squared and real

  1. Var(x) = 0 if X is constant is. p(x=c) =1

Compute

  1. y = aX + b, var(y) = a²var(x)

Compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Coefficient of Variation

A

δ/μ

The ratio of s.d / mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Bernoulli

A

An experiment with 2 outcomes: success and failure

X - only takes values 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Binomial

A

Sum of n independent Bernoulli trials

X - no. Of successes in n independent Bernoulli trials with probability of success p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Geometric

A

X - no. of independent Bernoulli trials until a success

The waiting time between successes in binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Negative Binomial

A

X - no. of Bernoulli trials until rth success

The sum of r independent geometric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Hyper geometric

A

X - no. of type 1 objects when n objects are drawn from population N containing M type 1 objects

Sampling without replacement - with replacement is the binomial!!

34
Q

Poison Process

A

X - no. Of accidents in a fixed time period - rate of events in time period

Poisson process involves assuming independence between non-overlapping time intervals - Poisson is only appropriate when this independence is satisfied

Limit of binomial with n large and p small

35
Q

Exponential

A

The time between events in a Poisson process

36
Q

Gamma

A

X - time until kth accident in Poisson process

A sum of r independent exponential variables

37
Q

Beta

A

Sequence of binomial distribution variables

38
Q

Joint density for independent variables

A

If X1 and X2 are independent, the joint density function must factorise into a product of the form f1(X1)f2(x2)

39
Q

Joint Distribution Function

A

Suppose X1…Xn are r.v. Defined on the same sample space.

The joint distribution function of X1..Xn is the function

F(x1, … xn) = P(X≤x₁…X≤xn)

40
Q

Marginal Distribution Function

A

A distribution of a single random variable

F₁(x) = P(X₁≤x) = P(X₁≤x,X₂≤∞…..xn≤∞)

41
Q

Iterated Expectation Law

A

E(X1) = Ex2[Ex1|x2 (x1 | x2) ]

Where Ex[•] denotes the expectation over the marginal distribution of X and Ey|x[•] denotes the expectation over the conditional distribution of Y given the value taken by X

42
Q

Central Limit Theorem

A

If X1, X2 … Are independent random variables having a common distribution with means μ variance δ

43
Q

Estimator

A

A statistic Tn = Tn (X1…Xn) is an estimator of a parameter θ if its value tn = Tn(X1.. Xn) is used as an estimator of θ

X1… Xn IID R.V. With unknown mean E(x) = μ
An estimator of μ is the sample mean

44
Q

rth moment of X about α

A

E((x-α)^r)

Variance is the 2nd moment of x about the mean

45
Q

Covariance

A

Cor(x₁,x₂) = E((x₁-μ₁)(x₂-μ₂))

If continuous: integrate over one variable over R and then the other variable over R

For discrete: sum over the different variables

46
Q

Cor(x₁,x₂)

A

Independence → Cor(x₁,x₂) = 0

Cor(x₁,x₂) = 0 does not imply Independence

47
Q

Sum of the expectations is the expectation of the sum

A

∑ E(Xi) = E( ∑ Xi ) = E(X1 + X2 + … Xn)

= ∑x1 ∑x2 … ∑xn (x1, x2, … xn) P(x1, x2….xn)

= ∑ X1 P(x1, x2….xn)…. ∑Xn P(x1, x2….xn)

48
Q

Variance of the sum = sum of the variance mutually independent variables

A

Summation and compute

49
Q

Continue joint distribution

A

X1, X2 are independent if joint pdf factorises

50
Q

ρ - in bivariate normal

Correlation coefficient -1

A

Correlation parameter - measures the strength of linear association between the two variables x1, x2

ρ = Corr(x,y) = cov(x,y)/σx σy

51
Q

Features of conditional joint distributions

A

Conditional mean: if E(x1|x2) is a linear function of x2 - suggests x1 and x2 are linearly dependent

If x2 > x1: conditional mean E(x1|x2) > marginal mean E(x1)
If x2 is greater than average - expect x1 to be greater than average

52
Q

Conditional joint variance of x1:

A

σ₁²(1-ρ²)

Since -1<p></p>

53
Q

Observations of conditional joint distributions

A

Conditional > marginal

Observed x1 exceeds its expectation and corr(x1,x2)

X2 is likely to exceed its expectation

54
Q

Expectation of product is he product of the expectation

A

Use PGF to show!!

55
Q

Sums of independent variables - PGF

A

If x1… Xn are independent discrete random variables taking non-negative integer values, the pgf of their sum is the product of their PGFs

If all the Xs are IID : pgf to power of n

Pgf only defined for discrete!!!!

56
Q

Mgf - sums of independent variables

A

If x1… xn are independent random variables - the mgf of their sum is the product of their mgfs

57
Q

Sample Mean

A

X bar is the sample mean
The average of all the observations in a sample
X bar n = Sn/n - it’s distribution is the sampling distribution of the mean

58
Q

Strong law of large numbers

A

P(lim(x bar= μ) = 1 as n tends to infinity

Almost every possible sequence of sample means tends strictly to μ as n→∞

59
Q

Chi-Squared Distribution

A

If z1…zn are independent N(0,1) random variables, the distribution of the sum of squares: ∑ Zi ² is the chi-squared distribution with n degrees of freedom

  • same as the Γ(n/2,1/2) distribution with expectation n variance 2n
60
Q

T-distribution

A

Z~N(0,1) and U~Xn²

Z and U are independent

Distribution of the ratio: T = z/√U/n is the t-distribution is n degrees of freedom

61
Q

F-distribution

A

If U and V are independent r.vs distributed as Xm² and Xn² respectively

Distribution of the ratio: W = U/m / V/n
Is the F-distribution with m,n degrees of freedom

W ~ Fm,n => 1/W ~ Fn,m

62
Q

Regression line

A

Y = b0 + b1x

Data points satisfy the n equations: yi = b0 + b1xi + ei

  • ei = prediction error
  • choose b0, b1 to minimise ∑ ei²
63
Q

Explanatory variable

A

Xi values

Independent values

64
Q

Response variables

A

yi are the observed (dependent) values of a random variable Yi, whose distribution depends on xi

65
Q

Regression Curve

A

The curve as a function of x is the regression curve of y on x

66
Q

Linear statistical model

A

One in which the regression curve is a linear function of the parameters of the model

67
Q

{ei}

A

Independent random variables with μ = 0 and σ² = common variance

68
Q

Residual sum of squares (RSS)

A

S(B0, B1) = ∑ (yi - B0 - B1xi) ²

69
Q

Least squares method

A

A way to estimate B0, B1 to minimise the sum of square errors

Data must be:
Homoscedastic - variation in y is same for all x - variance is constant

Independent

70
Q

Least squares error method

A

To minimise S(B0,B1)

Differentiate wrt. b0, B1 and set to zero

71
Q

Least squares estimate of μ

A

The least squares estimate of μ minimises the squares errors S(μ) = ∑(yi - μ)²

72
Q

Paired test

A

For data not independent!!!

73
Q

Hypothesis testing

A
1 sample: 
Known variance:
Z-test 
Unknown variance:
T-test - sample variance s^2

2 sample:
Known variance
Z - test (standard normal)

Unknown variance
T - test with pooled sample variance

Testing for variance:
F - test

2-paired:
Testing for mean

T - test reduced to 1 sample problem and taking the difference of the results as the new tested variable data

74
Q

Assumptions for paired sample t-test

A

Asse that the differences are independent, identically distributed and normally distributed

75
Q

Transformation formula

A

Any monotonic increasing or decreasing function!!!

fy(y) = fx(x) |dx/dy| =fx(g^-1(y)) |dx/dy|

Switched the derivative!
Insert the change of variables
Don’t forget the modulus
Monotonic!!

76
Q

Pgf of the sum is the product of the PGFs

A

If Sn a functions of Xi’s that are mutually independent, Any function of individual X’s are also mutually independent.
Thus the expectation of the product is the product of the expectations so the pgf is the sum is the product of the PGFs

77
Q

Y has a negative binomial distribution with parameters r and p

Explain how it arises in the context of sequences of Bernoulli trials and explain how y can be regarded as a sum of independent geometrically distributed random variables

A

The NB(r,p) distribution arises at the distribution of the number is trials required to obtain r successes in a sequences of independent Bernouilli trials, each with success probability p

The geo(p) dist. Is the distribution of the number of trials required to obtain one success
Because the trials are independent, the distributions of the number of trials between successes are therefore independent geometric r.v.s and the total number of trials until the rth success is therefore the sum of r independent geometric rvs
78
Q

Suppose X1 and X2 are discrete random variables with means μ1 and μ2
What is meant by saying X1 and X2 are independent

A

If X1 and X2 are aE independent, then for all pairs of values (x1, x2) P(X1 = x1, X2=x2) =P(X1=x1)(P(X2=x2)

79
Q

A sum of Poisson random variables is another Poisson

A

Sn~Poi(nμ)

80
Q

Choosing between estimators

A

The main things to consider when choosing between estimators are bias and variance
Is the standard error (s.d) tends to zero as n tends to infinity - the estimator is consistent

Can also compare estimator on the basis of MSE = variance + bias squared
The estimator with the smaller mean squared error would be preferred as this is likely to yield an estimate that is closer to the true value of the parameter

81
Q

Assumptions

A

Observations are normally distributed

There is no particular priori reason for this to be the case - in terms of situations in which the normal distribution is know to arise

The measurements have obviously been rounded to a discrete set of values - suggests that the assumption of normality probability is not realistic here

82
Q

Probability Generating Function

A

The pgf is defined for discrete random variables taking non-negative integer values