statistics Flashcards
what is a census
measures every member of a population
advantage of a census
accurate results
disadvantage of a census
expensive / testing may destroy the population
what is a sampling unit
individuals of a population
what is a sampling frame
a list of sampling units
what is simple random sampling
same chance of being selected.
random number generator
advantage of simple random sampling
bias free
disadvantage of simple random sampling
need a sampling frame
what is systematic sampling
take every k’th unit. k=pop/sample
pick random number between 1 and k to start
advantage of systematic sampling
quick to use
disadvantage of systematic sampling
need a sampling frame
what is stratified sampling
sample represents the groups (strata) of a population.
sample/population x strata
to find out how many people you need in each group
advantage of stratified sampling
reflects population
disadvantage of stratified sampling
population must be classified in strata
what are strata
groups
what is quota sampling
like stratified but strata is filled by interviewer/researcher
advantages of quota sampling
no sampling frame
disadvantages of quota sampling
non random, potential bias
what is opportunity sampling
quota filled by those available at the time
advantages of opportunity sampling
easy/cheap
disadvantages of opportunity sampling
unlikely to be representative
how to calculate the variance
(sum of x^2 / n) - mean ^2
mean of the squares minus the square of the mean
go to get from variance to standard deviation
square root variance = standard deviation
why are histograms used
for continuous data
what to compare on histograms
measure of location and measure of spread
what does PMCC mean
product moment correlation coefficient
what does PMCC measure
strength and +/- of correlation
what is a regressions line
the best line of best fit
what is interpolation
estimating inside the data range
more reliable
what is extrapolation
estimating outside the data range
less reliable
what is U on venn diagrams
shade all of both
what is n on venn diagrams
shade overlap
what does B| A mean
probability of B given that A has been picked
what is a mutually exclusive event and its properties
the venn diagram does not overlap
P(AnB) = 0
P (AUB) = P(A) + P(B)
probabilities of independant events on venn diagrams
P(AnB)= P(A) x P(B)
P(A|B) = P(A)
conditional probability formula
P(B|A) =
P(AnB)/P(A)
Addition law for probability
P(AUB) =
P(A) + P(B) - P(AnB)
what is discrete uniform distributions
probabilities of all outcomes are equal
Binomial distribution notation
X~B(n,p)
X is distributed binomials
n = number of trials
p = probability of success
when to use a binomial distribution
F - fixed number of trials
F - fixed probability of success
I - independent
T - Two outcomes
cumulative probability of P(X<5)
P(X<_ 4)
cumulative probability of P(X>3)
1 - P(X<_3)
cumulative probability of P(6<X<_10)
P(X<_10) - P(X<_6)
what is normal distribution
used for continuous random variables
normal distribution notation
Y ~ N (mean , variance)
y is distributed normally
what are the points of inflection on normal distribution
mean + /- standard deviation
what is the notation of standard normal distribution
Z~N(0, 1^2)
coding Z~
Z=
Y-mean/standard deviation
when can you approximate binomial distributions as normal distributions
if N is large
if p is approximately 0.5
mean = np
variance = np(1-p)
if you are approximating binomial as normal you are going from discrete to continuous values so you must change the values of the probability.
e.g. P(X>5)
= P(Y> 4.5)
what is the null hypothesis
H0 what we assume to be true
what is alternative hypothesis
H1 what would be true if H0 is wrong
what is significance level
the given threshold of likeliness
what is a one tailed test
H1 : p>k or p<k
what is a two tailed test
when H1 : p does not equal k
half significance level for each end
for correlation testing what is
H0
and
H1
compare r with table for
more extreme
less extreme
H0: r=0
H1= r>0 or r<0 or r is not 0
if more extreme reject H0
is less extreme no evidence to reject H0
for binomial testing what is
test statistic
H1
H0
assume H0 is
test statistic : number of successes observed
H0: p=k
H1 = p>k p<k p is not k
X~B(n,k)
find P(x<_>_ value in question)
if p< significance level - reject H0
if p> significance level - no evidence to reject H0</_>
for normal testing what is
sample mean
H0
H1
assume H0 is
sample mean ~ N ( mean, variance/n)
H0 : mean = k
H1: mean > k mean < k or mean is not K
assume sample mean ~N ( k, variance/n)
find P(sample mean > or < mean of sample taken)
if p<significance level - reject H0
if p > significance level - no evidence to reject H0