Week 7 - Selection Bias, sampling methods and information bias Flashcards
What is random error?
Random error is error introduced solely by chance and is
inherent in the sampling process
What is systematic error?
Also called bias
Systematic error is introduced via manmade actions relating to the conduct of a study
What is the sample vs. true population?
We do not measure the true population measure (mean,
%, etc) but an estimate of that based on representative
sampling
How can we decrease the random error in epidemiological studies?
- Chance/random bias decreases with increase in the
sample size - Goes down to zero if the total population is included
What is a confidence interval of sample estimates?
- A confidence interval indicates the level of uncertainty
around the estimated measure - Most studiesreport the 95% confidence interval (95%CI)
- 95%CI indicates a range within which we can be 95%
certain/confident that the true population measure lies
there; the larger the sample size the narrower is the
95%CI
How can we lower systematic error?
- Systematic bias are not influenced by sample size
What is selection bias?
- Selection bias is systematic error resulting from the fact
that the participants included in the study are not
representative of the population from where they were
selected (source population) - Selection bias leads to a biased sample, which almost
always, will give rise to biased estimates - The sampling method of choice plays a major role in the
representativeness of the sample
What is a representative sample?
What is a non-representative sample?
What are the three sampling methods?
- Probability (random) sampling: sample selected by
probabilistic methods; involves random selection,
allowing you to make strong statistical inferences about
the whole group - Systematic sampling: sample selected according to some
simple, systematic rule - Non-probability sampling: sample selected by easily
employed (convenient); involves non-random selection
based on convenience or other criteria, allowing you to
easily collect data.
Sampling methods summary
What is simple random sampling?
- Often referred to simply as ‘random sampling’
- The most straight-forward of all random sampling methods
- All individuals in the sampling frame have the same
probability of being selected independently of all others - It is mainly used in quantitative research.
- Given a large sample size, random sampling ensures the
chosen individuals are representative of the source
population
– Demography (e.g. age, sex, ethnicity)
– Other important factors (e.g., clinical history, current disease status,
lifestyle factors, etc.)
What are the advantages and disadvantages of Simple Random Sampling?
Advantages
* Ensures a representative
sample from the source
population
– Provided that the sample size is
large enough
* Less costly and less time
consuming from other more
sophisticated sampling
methods
* Ideal for quantitative studies
& test of hypothesis
Disadvantages
* If the sampling frame is too
large and/or the population
is geographically diverse it
may be impractical to
perform
* If a large sample is required,
simple random sampling
may be time consuming and
costly
What is Stratified Random Sampling?
- Same principles as simple random sampling but
within strata (subgroups) of the population
– in terms of key demographic characteristics - The size of the random sample should be proportional
to the specific stratum size in the population
An example stratified random sampling.
- The company has 800 female employees and
200 male employees. - You need a sample of 100
- You sort the population into two strata based
on gender. - You want to ensure that the sample reflects
the gender balance of the company so you use
random sampling on each group, selecting 80
women and 20 men, which gives you a
representative sample of 100 people.
What is the procedure Stratified Random Sampling?
What are the advantages and disadvantages of Stratified Random Sampling?
Advantages
* It allows you draw more
precise conclusions by
ensuring that every
subgroup is properly
represented in the sample.
* Enables the comparison of
population sub-groups
Disadvantages
* More time-consuming than
simple random sampling
* Higher complexity might
give rise to errors (e.g.
stratification not conducted
properly)
What is cluster sampling?
- Based on the hierarchical structure of natural clusters
(groups) of individuals within the population
– Natural clusters may be hospitals, schools, streets, city
districts, etc. - Involves taking a random sample of these natural clusters,
and then selecting all individuals in the selected clusters - The sampling frame is a list of all clusters.
- If it is practically possible, you might include every
individual from each sampled cluster. If the clusters
themselves are large, you can also sample individuals from
within each cluster using one of the techniques above
What are cluster sampling?
What are the advantages and disadvantages of cluster sampling?
Advantages
* Good for dealing with large
and dispersed populations
* Less costly and less time
consuming
Disadvantages
* Substantial differences between
clusters can cause errors
* It’s difficult to guarantee that the
sampled clusters are really
representative of the whole
population
* Representativeness may be
compromised if
– Too few clusters are selected and/or
– Clusters are too specific and/or
– Clusters contain too few individuals
What is multi-stage sampling?
- Utilizes the hierarchical structure of natural clusters (groups)
of individuals within the population
– Similarly to cluster sampling - After randomly selecting clusters, there is a random
selection of individuals within the cluster - May involve several random sampling stages:
– Stage 1: Random selection of large clusters e.g. schools
– Stage 2: Random selection of smaller clusters within large clusters
e.g. class
– Stage 3: Random selection of individuals within smaller clusters
What are the advantages and disadvantages of Multi-stage Sampling?
Advantages
* Multi-stage sampling may
improve sample
representativeness (compared to
simple random sampling)
– Especially if the population is
geographically diverse and/or the
sample is too small
* Less costly and less time
consuming (depending on the
number of stages however)
Disadvantages
* The representativeness of the
sample may be compromised if
– Too few clusters are selected
and/or
– Clusters are too specific and/or
– Clusters contain too few
individuals