Stats Flashcards
Binomial distribution conditions
- A finite number of trials are carried out
- The trials are independent
- The outcome of each trial is either a success or a failure
- The probability of a successful outcome is the same for each trial
Equation for binomial distribution
P(X=r)=nCrq^(n-r)p^r
Expectation and variance for binomial distribution
E(x)=np
Var(X)=npq
Mode of binomial distribution
If p=0.5 and n is odd, there are two modes.
Type I error
When H0 is rejected when it is true
P=significance level
Type II error
When H0 is accepted but it is false
P=Probability of H0 being accepted under conditions specified by H1
Least squares regression line of y on x
Minimises square of vertical distance m
y= a + bx
b = sxy/sxx
Used when x is independent variable and or you want to predict the value of y
What is the difference between interpolating and extrapolating
Using the regression line to make predictions within the range of data is interpolating, going outside of the range is extrapolating and is unreliable
What is a pdf
It is a function that allocates probability to each discrete value of X or it allocates probability to areas for a continuous variable
E(g(X))
Sum g(x)*P(X=x)
Expectation rules
E(kX) = kE(X)
E(k) = k
E(kX +c) = kE(X) + c
E(f(X) + g(X)) = E(f(X)) + E(g(X))
Variance rules
Var(X) = E(X^2) + E^2(X) Var(k) = 0 Var(kX) = k^2Var(X)
CDF
Cumulative distribution function
F(x) = P(X=< x)
Multiple independent random variables
E(X+Y) = E(X) + E(Y) E(X-Y) = E(X) - E(Y) E(X1+X2+…+Xn) = nE(X) Variance rules require independence Var(X+-Y) = Var(X) + Var(Y) Var(X1+X2+…+Xn) = nVar(X)
Discrete uniform distribution requirements and equation
This model is used when DUD X is defined over a set of n distinct values.
Each value is equally likely to occur, P(X=x)=1/n
Geometric distribution
Independent trials are carried out.
The outcome of each success is deemed a success or a failure.
The probability of success is the same for each trial.
The discrete random variable X is the number of trials needed to obtain the first successful outcome.
X~Geo(p)
P(X=r)=q^(r-1)*p
Expectation, variance and others for geometric distribution
E(X)=1/p
Var(X)=q/p^2
P(X<=x)=1-q^x
P(X>x)=q^x
Poisson distribution
Events occur singly and at random in a given time or space interval
λ is the mean number of occurrences in the interval and is known and finite.
P(X=x)=e^-λ*(λ^x)/x!
Mode, expectation and variance of Poisson
If λ is an integer, mode = λ-1 and λ
If not, then mode = floor(λ)
E(X)=Var(X)=λ
Also, X~Po(m) and Y~Po(n) then X + Y~Po(m+n)
When to use approximations
Binomial approx as poisson when n>50 and p<0.1, np<10
Binomial approx as normal when np>5 and nq>5
Poisson approx as normal when λ>15
Remember to do continuity corrections!
Expectation and variance of a continuous variable
Integration must be used - formula for Var(X) is the same as normal.
CDF for continuous
Integrate from a to t, where a is the lower limit.
When defining, remember to define outside the range as well.
To turn into a pdf, differentiate.
Rectangular distribution equations
E(X)=1/2 (a+b)
Var(X) = 1/12 (b-a)^2
F(X)=(x-a)/(b-a)
Normal distibution
Extends from -infinity to infinity
95% lies within 2 σ from μ
X~N(μ, σ^2)
F(X) for normal distributions is given by φ(z)
How to standardise a normal variable
Z=(X-μ)/σ
What is the formula for confidence interval?
k*σ/sqrt(n)
What are the requirements to use PMCC?
The data sets are both random
Define non-parametric test
A test where there is no underlying assumption that the data are from a normal distribution.
What is the assumption for wilcoxon single and double sample test?
The distribution is symmetric about the median, and that the sample is random.
When is a wilcoxon single rank approximated as normal?
When n is greater than 50
What is the purpose of the Wilcoxon Rank-Sum test?
It checks whether two samples are drawn from the same distribution by seeing if their medians are the same.
What are the assumptions of the wilcoxon rank-sum test?
X and Y are independent, and both distributions have the same shape