Statistics Flashcards
What is a population
The whole set of items that are of interest
What is a census
Measures every member of a population
What is a sample
A selection of observations taken from a subset of the population to find out information about the population as a whole
What are the advantages of a census
Represents total population
Provides all relevant data
What are the advantages of sampling
Quicker
Easier
Cheaper
What are the disadvantages of a census
Time consuming
Difficult
Expensive
May be impossible to get everyone
What are the disadvantages of sampling
May be incomplete or may not be representative
How does convenience/ opportunity sampling work
Taking a sample of people who are available at the time and fit the criteria
What is a random sample without replacement called
Simple random sampling
What is a random sample with replacement called?
Unrestricted random sampling
How does stratified random sampling work (basically)?
The population is divided into strata, random samples are taken from each strata in proportion to the size of each strata
What is quota sampling?
Similar to stratified but sample is not random
Population is divided into groups with a given characteristic and the size of the groups determines the proportion of the sample that should have that characteristic. The most convenient people with that characteristic are chosen until the quota is filled
What must a sampling method be for it to be random?
Each unit must have an equal chance of being chosen
Is systematic sampling random
Why
No
It is impossible for consecutive names in the sampling frame to both be in the same sample
How do you take a systematic sample
Work out the ‘skip size’ by dividing total population by the desired size of the sample, rounding the nearest integer
Use a RNG to select starting point which will be the first sampling unit
Add ‘skip size’ to this number and continue. Taking the members of the population who correspond with the numbers generated
This continues until sample size has been obtained
Give the strengths and weaknesses of random sampling
Strengths:
Free of bias
Cheap/easy for small samples
Each sampling unit has equal chance of being chosen
Weaknesses:
Not suitable for larger populations
Sampling frame needed
Strengths and weaknesses of stratified sampling
Strengths:
Accurately reflects structure of population
Guarantees proportional representation of groups within a population
Weaknesses:
Population must be clearly classified into distinct strata
Random selection within strata suffers same disadvantages as random sampling
Strengths and weaknesses of quota sampling
Advantages :
Allows small sample to be representative
No sampling frame needed
Quick/easy/cheap
Allows comparison between different groups
Disadvantages:
Can be biased
Division of population can be costly & inaccurate
Increasing scope of study increases number of groups
Disadvantages and advantages of systematic sampling
Advantages:
Simple/ quick
Suitable for large populations
Advantages and disadvantages of opportunity sampling
Inexpensive
Easy
Quick
Disadvantages:
Unlikely to be representative
Highly dependant on individual researcher
What is qualitative data
Non numerical eg colour
What are the different kinds of quantitative data
Discrete- only takes specific values eg shoe size, number of people (NB can still be infinite)
Continuous - can take any decimal value
3 measures of centre
Mean, median, mode
What is the mean
The sum of the data divided by the number of values
Define median
The middle value when data is ordered from smallest to largest
If there are an even number of values, the median is halfway between the two central values
Define mode
Most common value
There can be one mode, two modes (bi-modal) or no mode
Advantages and disadvantages of mean
Advantages:
Includes all data
Disadvantages:
Susceptible to outliers
When data is grouped it is an estimate of the mean
Advantages and disadvantages of of median
Advantages:
Less sensitive to outliers
Disadvantages:
Positional only
Grouped data requires interpolation
Strengths and weaknesses of mode
Strengths:
Can be used for qualitative
Weaknesses:
Only relevant if there are high frequencies
Can be misleading
Doesn’t consider the numerical value of the data
Name the types of measures of spread
Standard deviation
Interquartile range
Range
Formulas for standard deviation
Square root {[sum(x-u)squared] divided by n}
Or
Root [(sigma x squared over n) minus x bar squared]
IQR method
Upper quartile - lower quartile
Positives and negatives of standard deviation
Advantages:
Includes all data
Disadvantages:
Susceptible to outliers
Advantages and disadvantages of IQR
Advantages
Less sensitive to outliers
Disadvantages
Positional only and 50% is arbitrary
Grouped data requires interpolation
Disadvantages of range
Highly susceptible to outliers
What is variance
Standard deviation squared
How does adding/ subtracting affect the mean
Increases/ decreases by that amount
How does multiplying/dividing affect mean
Multiplied/ divided by that factor
How does Adding/ subtracting on standard deviation
No effect
How does multiplying/dividing affect the standard deviation
Multiplied/ divided by that factor
What are the first 13 square numbers
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169
What is r for PMCC
Sample PMCC
What is rho
Population PMCC
When comparing median/ means what should you reference
Compare size with reference to actual values and context
Larger value suggests larger sample
% difference should be calculated if >2 marks
If mean/ median and IQR/ standard deviation are close what does this suggest
Samples from the same population
Define population
All the data of a given group
Define Sample
A selection of some parts of the population
If a sampling method is random what does this mean
Each member has an equal and fair chance of being selected
What does independence mean
One outcome is unaffected by another outcome
What is discrete uniform distribution
A random variable with an equal chance for each outcome P(X=x) = k Where k is: 1 —————————- Number of variables
What are the requirements for a binomial distribution
Fixed number of trials
Two possible outcomes per trial
Constant probability
Independence
What is the area under a normal distribution curve
1 or 100%
3 standard deviations stat
99.7% lies within 3 sd of the mean
What does it mean if a sample is truncated?
What can we do with this
Zero lies less than 2 standard deviations below the mean
Reject as not normally distributed
What must you remember for normal approximations
p is close to 0.5
n is large
CONTINUITY CORRECTIONS
Why do we do continuity corrections
To change discrete (binomial) data into continuous (for normal)
Define H0
Population parameter you are comparing sample to
H1
The claim of how the sample might differ from population parameter
Population parameter
The value that defines a distribution
(For binomial is is ‘p’)
(For normal it is Mew and variance )
Define critical value
The first value in the critical region for which sample results would have a chance below significance level of occurring
Define critical region
The range of values for which H0 is rejected
Define p value
Probability of the result from your sample in relation to assumed population
Define significance level
The percentage for which any results below significance level suggests an unlikely outcome and therefore reasonable to conclude that the sample is unusual enough to reject H0
Define test statistic
The value you get from your sample to compare with the critical value
When does the critical region start exactly for binomial and normal
Binomial: critical value will be first value within critical region
Normal: critical region will always represent exact significance level
When is PMCC not a good estimator
Outside original sample
PMCC is weak
Used to make a prediction about a different population
P(A’)
Probability of A not occurring (compliment)