Definitions Flashcards
CRV E(X)= Var(X)=
0.5(b+a) or integrate between boundaries and * x
1/12(b-a)^2 or integrate and *x^2 - mean^2
Conditions of binomial
fixed number of trials
constant probability, independent trial, two outcomes
Conditions of poisson
singly, constant rate, independantly
Census =
investigation of every member or population
Sampling unit =
individual member/element of population
Sampling frame =
list of all the populations/sakmpling units e.g. name or unique ID
Sampling distribution =
set of all possible values of the statistic together with their individual probabilities
Why are samples better than a census
quicker and a census would use up all elements of sample
mode =
value of x at which maximum occurs, dy/dx = 0
Median =
F(median) =0.25
integrate pdf
-ve skew means
mean less than median less than mode
Q3-Q2 less than Q2-Q1
quadratic up straight line down
When doing a pdf
draw points when y=0
Population =
collection of all items
Sample =
subset of population intended to represent the population
B —> Po
n large > 50 p small <0.2
B—-> N
Continuity correction (as discrete to continuous)
n large (n>50)
p close to 0.5
(np>5, nq>5)
Po—-N
lambda large (>20)
Po—-B
if only 3ish marks
X-Po(3)
P(X=2) = 0.5
X-Po(6)
P(X=2) =
0.5^2 = 0.25
Poisson P(1<=X<=4) =
P(X<=4) - P(X<1)
Statisitic =
a r.v. which is some function of a sample and not dependant on any parameters e.g. not mu or sigma but x bar is fine
Sampling distribution =
Probability distribution of all values
Hypothesis test =
mathematical procedure to examine a value of a population parameter proposed by the null hypothesis compared with an alternative hypothesis
Critical region =
range of values of a test statistic which would provide enough evidence to reject the null hypothesis
If for top tail P(X<=9) > 0.95 then
X>=10
Actual significance level means
add values up and should be nearish original significance level
When drawing a ‘suitable pdf’
think about skewness and logic not always exact normal distribution bell curve for example
In hypothesis testing can be more than or less than significance level but should be
as close as possible to it
Why do a CCC
Due to going from discrete to continuous so making up for gaps
P(|X|<1.5) =
P(-1.5 less than x less than 1.5)
2 tail bottom or top
Np < value top
Np > value bottom
Significance level in two tail is
Half of original
+skew means
mean>median>mode
Q3-Q2>Q2-Q1
Majority of data on left
Right tail longer
Finite population
A population is one in which each individual member can be given a number
(a population might be so large that it is difficult or impossible to give each member a number –
e.g. grains of sand on the beach).
Infinite population
A population is one in which each individual member cannot be given a number.
Simple random sample
A simple random sample of size n, is one taken so that every possible sample of size n has an
equal chance of being selected.
The members of the sample are independent random variables, X1, X2, … , Xn , and each Xi has
the same distribution as the population
Sample
A selection of sampling units from the sampling frame
Sample survey
An investigation using a sample
Advantages of a census
Every member of the population is used.
It is unbiased.
It gives an accurate answer.
Disadvantages of a census
It takes a long time.
It is costly.
It is often difficult to ensure that the whole population is surveyed.
Advantages of sampling
Sample will be representative if population large and well mixed.
Usually cheaper.
Essential if testing involves destruction (life of a light bulb, etc.).
Data usually more easily available
Disadvantages of sampling
Uncertainty, due to the natural variation – two samples are unlikely to give the same result.
Uncertainty due to bias prevents the sample from giving a representative picture of the population
Bias comes from
subjective choice
incomplete sampling frame
*bias cannot be removed by increasing the size of the sample
Always remember
Define random variable
Give in context
+C
Continuity correction as discrete to continuous
Square root variance in normal distribution
Conditions of normal
Median = mean = mode Area under curve = 1 Bell shaped Symmetrical Mean and standard deviation parameters
mean =
variance =
x * P(X=x) or n*p
x^2 * P(X=x) - mean^2 or n^2p - (np)^2
P(5<=X<7)
P(X<=6) - P(X<=4)
Var(X+Y)
E(X+Y)
Var(X) + Var(Y)
E(X) + E(Y)
conditions for Binomial to Normal
n large p close to 0.5
np > 5
n(1-p) = npq > 5
Critical regions and Critical values are
Critical values are two specific values while the region is everything inside as well
Countably infinite population =
infinite size but each member can be given an individual member
Sample mean is…. the population mean
Sample variance is… the population variance
equal to
less than
Random sample =
every possibility has equal chance
Randint#(1,50)
Coin = Binomial =
(n, 0.5)
What is the most accurate p for B–>N
0,,5 so the binomial distribution is reasonably symmetrical
Y = Xbar =
sumx/n = statistic (X bar is allowed but mu is not)
Population does not always equal the sampling frame because
it is not always possible to keep this list up to date
Most accurate approximation is one which
most closely meets the requirements of approximation e.g. how close p is to 0.5
CRV P(X=a) =
0 as not a probability density function
Poisson events are independent so will occur again at the same rate in the same time
So just square the value or cube etc…
n! =
number of ways of ordering a collection of n objects
nCr =
n!/(n-r)!r! = number of ways selecting r objects from n
Why is Poisson to normal when lambda is large
because distribution would be fairly symmetrical
CRV = pdf = DRV = pd =
probability density function
probability distribution
Always show that x = 0 on a
Var(X) =
E(X^2) - E(X)^2
E(X^2) =
Var(X) + E(X)^2
P(X=4)
P(X<=4) - P(X<=3) = F(4) - F(3)
P(X>E(X)) > 0.5 therefore
mean less than median therefore negative skew
For two tailed test when not finding critical region make an educated guess on whether > than or < than or just do critical region for both sides
does not equal
half if doing both sides but if just one side then dont
E(X) =
integrate f(x)*x NOT integrate F(X) must be f(x) so differentiate if need be
E(X^2) =
integrate f(x)*x^2 NOT F(x)
E(-X^2 + 9X) =
E(-X^2) + E(9X) = 9E(X) - E(X^2)
P(X>k) = P(Y
P(Y>n-k)
P(X>k) =
P(Y>n-k)
P(4<=X<=8) =
P(X<=8) - P(X<=3) draw number line with squares at midpoints of numbers rather than lines
J-U[a, b]
Uniform
Discrete to continuous P(X=5) =
P(4.5<=X<=5.5)
X-B(100,0.975) approximation
Poisson
Significance level =
probability of incorrectly rejecting H0
Read question if as close as possible to
significance level
Sampling frame =
identifier
Statistic does not equal
mu or sigma
P(mu - ksigma less than X less than mu + ksigma) = 0.5
P(Q1 < X < Q3) = 0.5 therefore mu+ ksigma = Q3 etc or P(X = k) = 0.75
key words for poisson
rate and randomly scattered
CRV:
P(X=5)=
P(X>=10) =
0
1-P(X<=10)
DRV P(x>=10) =
1-P(X<=9)
P(X…) is very low then
unlikely that parameter probability (p) is correct
P(X’)^2 =
1 - P(X)^2
NOT (1-P(X))^2
Always check F(X) =
0, 1, previous
and remember +C
F(x) put (0,1) at
maximum of sketch
2 minutes added to all values find new median and variance
E(X+2) = E(X) + 2 Var(X+2) = Var(X)
F(min x value) =
0
Mode =
minimum x value
Justify mode is maximum via
second derivative
E(5x^2) =
5E(X^2)
Given that X=4 over 2 days
Find X=1 on day 1 and X=3 on day 2
X-Po(1 per day)
P(X=1) * P(X=4) when X-Po(1)
divided by P(X=4) when X-Po(2)
Find the probability that 1 car after another car in less than 2 days
X-Po(2 days)
P(more than 1 cars) = P(X>=1)
Try not to have an unrounded value when going from discrete to normal distribution for continuity correction
Round to 0 decimal places
Diagram for binomial or Poisson is
Prob y axis
number of events x axis
vertical lines with spaces