stats AS Flashcards
sample
set of data values for a random variable
population
a group that you want to sample information about.
e.g. year 7 students in a school
sampling frame
a collection of the items available to be sampled
e.g. a list of all the year 7 students in a school
sample survey
when information is collected from a small representation of the population
sampling unit
the person/object to be sampled
sampling fraction
the proportion of available sampling units that are actually sampled
census
when all the population has information collected about them
simple random sampling
every item of the population has an equal chance of being picked
opportunity/convenience sampling
sampling whatever/whenever its easiest
stratified sampling
the population is divided into categories then a random sample is chosen from each category.
each categories size is proportional with the population.
cluster sampling
the population is divided into strata representative of the population. a random sample of clusters is chosen and every item in the chosen clusters is sampled.
a large number of small clusters is most accurate.
systematic sampling
every nth member is selected
quota sampling
the population is divided into groups and a given number from each group is sampled.
self-selecting sample
where people volunteer to taake part or are given a choice to participate.
may be bias as people may chose to express their opinions of certain matters.
unimodal
one bump
bimodal
two bumps
positively skewed
bump near beginning
negatively skewed
bump near end
median
n+1/2
frequency density
= frequency/ class width
discrete Variables
can only take certain values but not those in between
Bivariate data
two variables are assigned to each item
Mean =
(Σxf)/n
spearmans rank
shows association of the data
Pearsons Product Moment correlation Coefficiant
a measure of correlation, r
standard deviation
σ = sqrt (( (Σx^2)/n)- μ^2)
standard deviation with frequency
σ = sqrt (( (Σx^2f)/Σf)- μ^2)
outlier
a point more than two standard deviations away from the mean
varience
σ^2
P(A∨B) =
for mutually exclusive events
P(A)+P(B)
P(A∨B) =
for not mutually exclusive events
P(A) + P(B) - P(A∧B)
Independent events
events that have no effect on each other
prove events are not idependent
P(A∧B) =/= P(A) x P(B)
P(B|A)
probability of event B given that event A has happened
if an event if independent
P(B|A) = P(B|A’)
P(A∧B) =
P(A) x P(B|A)
how to run a simple random sample
- give a number to each population member
- Generate a list of random numbers
- Match these numbers to the population members to select the samples
simple random sample advantage
every member of the population has an equal
simple random sample disadvantage
it can be inconvenient if the population is spread over a large area
how is a systematic sample carried out
- Give a number to each population member from a list of the full population
- calculate a regular interval to use by dividing the population size by the sample size
- generate a random starting point then follow the pattern
systematic ample advantage
it can be used for quantity control on a production line. it should also give an unbiased sample.
it relatively easy.
systematic ample disadvantage
The regular interval could coincide with a pattern, giving a biased/unrepresentative distribution.
opportunity/convenience advantage
data can be gathered very quickly and easily
opportunity/convenience disadvantage
it isn’t random and can’t be very biased
Stratified sampling advantage
if the population can be divided up into distinct categories, its likely to give a representative sample.
different categories may differ and can be looked at independently.
Stratified sampling disadvantage
its not useful when there aren’t any obvious categories
it can be expensive because of the extra detail involved
Quota sampling advantage
it can be done when there isn’t a full list of the population.
The sampler continues to sample people until they have enough
Quota sampling disadvantage
can be easily biased by the sampler
Cluster sampling advantage
more practical
can incorporate other sampling techniques
Cluster sampling disadvantage
less representative of the population
tow-stage cluster sample
randomly choose the samples then randomly select people from each cluster
Normal distrubution
X~N(μ, σ^2)
σ
standard deviation
σ^2
varience
Standard normal distrubution
Z~N(0,1)
standard normal distrubution formula
Z = (X-μ)/σ
test for normal approximation to binomial
np>5
nq>5
normal distrubution of a sample
~N(μ, σ^2/n)
2/3 of normal distrubution
μ+-σ
95% of normal distrubution
μ+-2σ
99.7% of normal distrubution
μ+-3σ
critical value for normal hypothesis test
μ+-k(σ/√n)
where k = Ф(significance level)