Sampling Error and Bias Flashcards
Why does increasing sample size reduce standard error?
The law of large numbers. Extreme values have less influence on the average. Kind of diluted.
What are the 2 ways to increase power of a study?
Increase sample size
Reduce variability - sample from a more homogeneous population
What are type I and type II errors?
Type I is where you wrongly reject the null hypothesis - thinking a difference exists when it doesn’t in reality.
Type II is where you wrongly accept the null hypothesis - - assuming no difference exists when it does in reality.
What is random error and how is it measures?
The natural variation that occurs through a random sample. Measured by standard error.
How can you reduce the effect of random error?
Increasing sample size
What are the types of systematic error (bias)?
Measurement error, sampling error and reporting error
How can sampling/selection bias occur?
Sample drawn not representative of the population
- undercoverage e.g. online surveys underrepresent elderly
- sample frame error (when the sample frame includes people that would never be involved)
- non-response bias (survey doesn’t account for non-response)
Basement characteristics of 2 groups to be compared not equal
-e.g. experimental group chosen and control are healthy volunteers (voluntary response bias)
How can measurement bias occur?
- Variation in measurements
- Different data collectors might vary in method
- Instruments not correctly calibrated
- Performance bias (e.g. cases more likely to have a knowledge of the disease and symptoms + better previous medical records)
- Detection bias (e.g. investigators paying more attention to symptoms of those known to be in case/experimental group)
How can reporting bias occur?
- Citation bias (not citing papers that contradict your argument)
- Publication bias (not reporting non-significant results)
- Language bias (only reporting English studies)
What are types of sampling scheme?
- Simple random sampling
- Systematic sampling
- Cluster sampling
- Stratified sampling
Describe the steps of simple random sampling
- Define and identify the survey population
- Define the sampling frame (all units in a list)
- Number each unit
- Determine the sampling size
- Randomly draw units until the sample size is reached (usually with a random number generator)
What are the advantages of simple random sampling?
- Statistically the optimal method (each unit has an equal likelihood of being chosen)
- Sampling error can easily be calculated
- Simple to do
What are the disadvantages of simple random sampling?
- Creating a sample frame can be difficult (not always detailed records of population)
- Can have logistical challenges if random units chosen are far from each other
- Minorities can easily be missed out
What is the difference between sampling with replacement or without?
Sampling without replacement means that the probabilities of being chosen after each unit is chosen so not equal probability of sampling. However sampling with replacement often makes no sense - e.g. don’t want the same person to fill out the questionnaire twice.
What are the steps of systematic sampling?
- Define and identify the sampling population
- Create the sample frame (e.g. population of 10,182)
- Arrange the units in a sequence (e.g. alphabetically by surname)
- Determine sample size needed (e.g. 320)
- Divide total sampling frame by sample size (e.g. 10,192/320 = 32 ish)
- Choose a random starting point (between 1 and 32)
- Draw units at regular intervals defined in step 5 (every 32nd unit after the first was chosen randomly)
What are the advantages of systematic sampling?
- Ensures representativity
- Simple to do
- Sampling error easy to determine
What are the disadvantages of systematic sampling?
- Creating sample frame can be difficult
- If there’s some sort of pattern in the ordered sampling frame then it can lead to a difference in probability of each unit/subgroup of unit being chosen (e.g. if sample frame was ordered male/female and the sample interval was even then the sample would include only 1 gender)
Why would you use cluster sampling?
Because random sampling can be logistically challenging and it can be more practical so cluster the population and sample from representative clusters e.g. schools/community centres
What are the steps of cluster sampling?
- List of potential clusters
- Create a cumulative list of all the units in all the clusters
- Calculate the systematic sampling interval (by dividing cumulative total population by number of clusters wanted)
- Choose random number at which to start (between 1 and sampling interval)
- Choose each unit at the sampling interval and the cluster that unit is in is the cluster chosen
- Continue until the right number of clusters
What is the issue with variability in cluster sampling?
There’s a higher covariance inside clusters, meaning units within clusters are likely to be more similar to one another than to units outside the cluster (e.g. kids from same school likely to be from the same socio-economic group). This gives a high intra-class correlation coefficient. This gives a higher overall sample variance and therefore sample error. Can counteract by increasing sample size but can be inefficient.
What are the advantages of cluster sampling?
- More practical when dealing with a dispersed population
- Can be the only way to sample, if you don’t have a sampling frame
What are the disadvantages of cluster sampling?
- Co-variance problem (less variability between units within the clusters then outside) - greater covariance within groups. Increases variability and sample error - increased standard error and need a larger sample size.
- Fewer clusters are logistically easier but gives more sampling error and a lower sample size
- Given the way the clusters are chosen it is important each cluster is the same size so that none are more likely to be chosen
Why might you choose a stratified sampling scheme?
If your population includes minorities at low frequency that your study requires to be represented
How does stratified sampling work?
The sampling frame is divided into homogeneous subgroups (strata) and then the units are chosen from them using random sampling