Maths stats Flashcards
What is a census
Measures or observes every member
What is a sample
Selection of observation taken from subset of pop used to find out about whole pop
Advantages of census
Results should be completely accurate
Disadvantages of census
Time consuming, expensive, cannot be used when testing destroys process and hard to process large quantity of data
Advantages of sample
Less time consuming and cheaper, fewer people have to respond, less data needs to be processed
What is random sampling
Each member of pop has equal chance of being selected
What is simple random sampling
Everything has equal chance of being selected
Advantages of simple random sampling
Free of bias, easy and cheap for small samples and pops and each sampling unit has known and equal chance of selectionDi
sadvantages of simple random sampling
Sampling frame needed and not suitable for large samples and populations
What is systematic sampling
The required elements are chosen at regular intervals from and ordered list
Advantages of systematic sampling
Simple and quick to use, suitable for large samples and large populations
Disadvantages of systematic sampling
A sampling frame is needed and bias introduced if sampling frame is needed, Bias introduced if sampling frame is not random
What is stratified sampling
The population is divided into mutually exclusive strata and a random sample is taken from each strata in proportion to size of strata
Equation for stratified sampling
(number in stratum x overall sample size) / number in population
Advantages of stratified sampling
Sample accurately reflects population structure, proportional representation of group within population
DIsadvantages of stratified sampling
Population must be clearly classified into distinct strata, same disadvantages as simple random within each strata
Two types of non random sampling
Quota sampling and opportunity sampling
What is quota sampling
An interviewer selects a sample that reflects the charecteristics of the whole opulation
Advantages of quote sampling
Allows small sample to still be representational of whole pop, so sampling frame, quick and cheap and easy comparison between different groups within a population
Disadvantages of quota sampling
Non random sampling can introduce bias, population must be divided into groups which can be costly or innacurate, increasing scope of study increases number of groups which adds time and money, non responses not recorded
What is opportunity sampling
Sample is taken from people who are available at the time and who fits criteriaA
Advantages of opportunity sampling
Easy and inexpensive
Disadvantages of opportuinity sampling
Unlikely to provide a representitative result and highly dependant on researches
Criteria for a binomial dist
The number of observations n is fixed.
Each observation is independent.
Each observation represents one of two outcomes (“success” or “failure”).
To make binomial suitable what would be ideal
LARGER N
Conditions for normal aproximation of binomial
Large n and p close to 0.5
State one disadvantage of using quota sampling compared with simple random
sampling.
nOT RANDOM SO CANNOT USE RELIABLY FOR INFERENCES
Mutually exclusive
Both cannot happen at once
P(A or B) mutulaly excluvive
P(A) + P(B)
Independent P(A GIVEN B )
P(A)
Reason to include outliers
It is a piece of data and we should include all pieces of data
Reasons to not include outliers
It is extreme and could unduly influence anaylsis or could be a mistake
Do you include NA at all when calculating mean
No
State the assumption involved with using class midpoints to calculate an estimate of
a mean from a grouped frequency table.
Assumes values are uniformly distributed within the classes
Why could random sampling not be used
It is not possible to have a sampling frame
Conditions for normal dist
Variable has to be continuous
P(x=5) for continuous
0 as continous
+- standard deviation for normal
Point of inflection
Numbers not in table
3sf
Numbers in talbe
4 dp
For normal mean=
Mode = median
z=
x-mew / o
Standard deviation binomial
np(1-p) root that
Why would oyu have to times p vbalue by 2
For normal dist, it is ewual on both sides
For cumalitive frequenct what value do you plot
Top value
Histogram height
Work out area scale factor in relation to frequency (double check this )
What is extrapolating
Estimate outside range (unreliable )
What is interpolation
Estimate inside range
What is the explanatory variables
The one thrat explains the other and that causes change
What does close to 1 mean
Positive correlation
Normal aprox
Make binomial and then make normal from that i think
list
aDD 1/2 FOR MEDIAN
Conditions for poisson aprox of binomial
Large n small p np <10
variance of (3x-1)
square 3
variance poisson
Mean or np(1-p)
Conditions for poisson
Events must occur independently, events must occur singly, events cannot occur at same time, events occur at constant
H0 for chi squared test
No difference between theoretical frequency and observed frequency
Binomial need thing
Need to either succeed or fail
What is area
Significance level
DOR
(rows-1)(columns-1)
EF
Row time column over grand
What is needed for central limit theorum
Large sample size unless data is already normally distributed
mean for nb
r/p
variacne for nb
r(1-p)/p squared
What is a type 1 erro
Actual significance error (chance they are lucky)
What is type 2 error
Incorrectly accept h0 p(not critical region given h1 i true)
What will reducing significane level (type 1 error) do
Increase type 2 error
What does increasing n do to erorrs
Reduces type 2
What is size
Type 1 errorh
What is power
1-type 2 error
size
Rejecting h0 given h0 is true
Power
rejecting h0 given h1 is true
variance
Dont divide by n if its a discrete uniform distribution
If chi squared greater than critical value
Reject that it is a fitable modle
g(1) always =
1
Remember you can rearrange var formula for e(x)squared
ok
What happens if proportion of defective things stays same and dof stays same but proportions can be ree allocated
No change in test statistic
) Explain the relevance of the Central Limit Theorem in part (a)
CLT applies since the sample size is large B1 3.5b
CLT states that the sample mean/
S
is (approximately) normally
distributed
formula for chi squared
(o-e)squared/E
E in 2 wasy
Row total times column total / grand total
Dof 2 way
(r-1)(c-1)