Week 7 - Selection Bias, sampling methods and information bias Flashcards
What is random error?
Random error is error introduced solely by chance and is
inherent in the sampling process
What is systematic error?
Also called bias
Systematic error is introduced via manmade actions relating to the conduct of a study
What is the sample vs. true population?
We do not measure the true population measure (mean,
%, etc) but an estimate of that based on representative
sampling
How can we decrease the random error in epidemiological studies?
- Chance/random bias decreases with increase in the
sample size - Goes down to zero if the total population is included
What is a confidence interval of sample estimates?
- A confidence interval indicates the level of uncertainty
around the estimated measure - Most studiesreport the 95% confidence interval (95%CI)
- 95%CI indicates a range within which we can be 95%
certain/confident that the true population measure lies
there; the larger the sample size the narrower is the
95%CI
How can we lower systematic error?
- Systematic bias are not influenced by sample size
What is selection bias?
- Selection bias is systematic error resulting from the fact
that the participants included in the study are not
representative of the population from where they were
selected (source population) - Selection bias leads to a biased sample, which almost
always, will give rise to biased estimates - The sampling method of choice plays a major role in the
representativeness of the sample
What is a representative sample?
What is a non-representative sample?
What are the three sampling methods?
- Probability (random) sampling: sample selected by
probabilistic methods; involves random selection,
allowing you to make strong statistical inferences about
the whole group - Systematic sampling: sample selected according to some
simple, systematic rule - Non-probability sampling: sample selected by easily
employed (convenient); involves non-random selection
based on convenience or other criteria, allowing you to
easily collect data.
Sampling methods summary
What is simple random sampling?
- Often referred to simply as ‘random sampling’
- The most straight-forward of all random sampling methods
- All individuals in the sampling frame have the same
probability of being selected independently of all others - It is mainly used in quantitative research.
- Given a large sample size, random sampling ensures the
chosen individuals are representative of the source
population
– Demography (e.g. age, sex, ethnicity)
– Other important factors (e.g., clinical history, current disease status,
lifestyle factors, etc.)
What are the advantages and disadvantages of Simple Random Sampling?
Advantages
* Ensures a representative
sample from the source
population
– Provided that the sample size is
large enough
* Less costly and less time
consuming from other more
sophisticated sampling
methods
* Ideal for quantitative studies
& test of hypothesis
Disadvantages
* If the sampling frame is too
large and/or the population
is geographically diverse it
may be impractical to
perform
* If a large sample is required,
simple random sampling
may be time consuming and
costly
What is Stratified Random Sampling?
- Same principles as simple random sampling but
within strata (subgroups) of the population
– in terms of key demographic characteristics - The size of the random sample should be proportional
to the specific stratum size in the population
An example stratified random sampling.
- The company has 800 female employees and
200 male employees. - You need a sample of 100
- You sort the population into two strata based
on gender. - You want to ensure that the sample reflects
the gender balance of the company so you use
random sampling on each group, selecting 80
women and 20 men, which gives you a
representative sample of 100 people.
What is the procedure Stratified Random Sampling?
What are the advantages and disadvantages of Stratified Random Sampling?
Advantages
* It allows you draw more
precise conclusions by
ensuring that every
subgroup is properly
represented in the sample.
* Enables the comparison of
population sub-groups
Disadvantages
* More time-consuming than
simple random sampling
* Higher complexity might
give rise to errors (e.g.
stratification not conducted
properly)
What is cluster sampling?
- Based on the hierarchical structure of natural clusters
(groups) of individuals within the population
– Natural clusters may be hospitals, schools, streets, city
districts, etc. - Involves taking a random sample of these natural clusters,
and then selecting all individuals in the selected clusters - The sampling frame is a list of all clusters.
- If it is practically possible, you might include every
individual from each sampled cluster. If the clusters
themselves are large, you can also sample individuals from
within each cluster using one of the techniques above
What are cluster sampling?
What are the advantages and disadvantages of cluster sampling?
Advantages
* Good for dealing with large
and dispersed populations
* Less costly and less time
consuming
Disadvantages
* Substantial differences between
clusters can cause errors
* It’s difficult to guarantee that the
sampled clusters are really
representative of the whole
population
* Representativeness may be
compromised if
– Too few clusters are selected and/or
– Clusters are too specific and/or
– Clusters contain too few individuals
What is multi-stage sampling?
- Utilizes the hierarchical structure of natural clusters (groups)
of individuals within the population
– Similarly to cluster sampling - After randomly selecting clusters, there is a random
selection of individuals within the cluster - May involve several random sampling stages:
– Stage 1: Random selection of large clusters e.g. schools
– Stage 2: Random selection of smaller clusters within large clusters
e.g. class
– Stage 3: Random selection of individuals within smaller clusters
What are the advantages and disadvantages of Multi-stage Sampling?
Advantages
* Multi-stage sampling may
improve sample
representativeness (compared to
simple random sampling)
– Especially if the population is
geographically diverse and/or the
sample is too small
* Less costly and less time
consuming (depending on the
number of stages however)
Disadvantages
* The representativeness of the
sample may be compromised if
– Too few clusters are selected
and/or
– Clusters are too specific and/or
– Clusters contain too few
individuals
What is Systematic Sampling?
- Sample selected according to some simple, systematic rule,
but not randomly - Sample may end up being equivalent to a simple random
sample, provided there was no biasing pattern in the system
of selection
What is the Systematic Sampling procedure?
What are the advantages and disadvantages of Systematic Sampling?
Advantages
* An acceptable, more
convenient, alternative
approach if for some reason
random sampling is not
possible
* Faster and possibly also
cheaper
Disadvantages
* The representativeness of
the sample may be
compromised if the system
of choice selects individuals
in a non-random fashion
What is Proportional Quota Sampling?
- Same principle as stratified random sampling
– The sample is selected on a weighted manner based on
predefined strata (distinct population subgroups) - Strata instead of being filled by random sampling, they
are filled by non-random sampling (systematic or
other)
– For example, if a total sample size of 1000 is required and
the population consists of 40% women and 60% men, then
(non-random) sampling will continue until these percentages
are obtained and the overall sample quota met
What is the Proportional Quota Sampling procedure?
Advantages
* An acceptable, more
convenient, alternative
approach if for some reason
stratified random sampling
is not possible
* Compared to simple
systematic sampling, could
ensure the original
population structure as it
uses predefined population
strata
Disadvantages
* The representativeness of
the sample may be
compromised as individuals
are selected in a nonrandom fashion
What is convenience sampling?
- Convenience sampling is the most frequent example of
non-probability sampling - Individuals are selected in a non-random fashion, solely
based on convenience (i.e. they are easy to access)
What is the Convenience Sampling procedure?
What are the advantages and disadvantages of Convenience Sampling?
Advantages
* Cheap, fast and convenient
Disadvantages
* The representativeness of
the sample will definitely be
compromised as individuals
are selected in a nonrandom fashion
How do you know which sampling method to choose?
- Depends on:
– The aim of the study
– The nature of the source population
– The sample size
– Other practical issues (i.e. financial resources, time availability, etc.) - When no financial and time constrains exist:
– Always strongly advised to use probability (random) sampling techniques
in order to minimize selection bias
– Stratified random sampling is the ideal method if the sample is small - When non-random sampling techniques have been used:
– The representativeness of the sample is always questionable
– Assume that selection bias is operating at some extent
How does sampling method affect descriptive research?
In descriptive research (i.e. investigating the prevalence of a
disease in a population):
– Extremely important to have a perfectly representative sample, as
selection bias will greatly influence the findings
How does sampling method affect analytic research?
In analytic research (i.e. investigating exposure-outcome
associations):
– Minor deviations from a perfectly representative sample may be
acceptable
* Minor selection bias may not affect the findings at a large extent
Which sampling method is not prefered?
Convenience Sampling
What are the 2 types Systematic Error (bias)?
- Selection bias: Systematic error arising from
mistakes conducted during the selection of the
study sample. - Information bias: Systematic error arising from
mistakes conducted during the measurement of
key study variables (exposure and outcome).
What is information bias?
- Information bias arises from wrong / inaccurate
assessment of either the exposure or the outcome
variables - Such mistakes may arise from the researchers’ part
(unintentionally) or from the participants’ part
(unintentionally or intentionally) - There is also instrument bias (fault of the instrument)
which falls under researcher’s part
What is assessor bias?
- Wrong/inaccurate diagnosis due to a clinical error
- May occur when researchers are not “blinded” to exposure or
outcome status of participants - Wrong/inaccurate measurements due to a faulty
instrument/machine - Wrong/inaccurate measurements due to poor training of
assessor - Mistakes during recording of the data and transferring data
from paper form into electronic form
How can information bias arise from participant action / misinterpreting?
- Wrong/inaccurate answers from participants due to
misinterpretation of a question - Wrong/inaccurate answers from participants due to a
sensitive issue relating to the question - Wrong/inaccurate answers from participants due to poor
recall (recall bias) - Wrong/inaccurate answers from participants intentionally
- Overall, information bias arising from participant actions
is called response bias
What are the 6 types of information bias?
- Recall Bias
- Interviewer Bias
- Observer bias
- Hawthorne effect
- Surveillance bias
- Misclassification bias
What is recall bias?
Those participant with a particular outcome or exposure
may remember events more clearly or amplify their recollections –
very common in case-control studies- the primary difference arises
more from under-reporting of exposures in the control group rather
than over reporting in the case group
What is interviewer bias?
A researcher’s knowledge may influence the
structure of questions and the manner of presentation, which may
influence responders – any study design (especially if they are not
blinded to exposures)
What is observer bias?
Researchers may have preconceived expectations of
what they should find in an examination (especially if they are not
blinded to exposures or medical history)
What is the Hawthorne effect?
Participants act differently if
they know they are being watched.
What is Surveillance bias?
The group with the known
exposure or outcome may be followed more closely
or longer than the comparison group (researcher’s
bias).
What is Misclassification bias?
Errors are made in
classifying either disease or exposure status
(instrument).
What are the two types of errors?
- Systematic error:
a. Information error
b. Selection error - Random error
How can you minimize bias?
- Be purposeful in the study design to minimize the chance
for bias; Example: use more than one control group - Define, a priori, who is a case or what constitutes
exposure so that there is no overlap; Define categories
within groups clearly (age groups, aggregates of person
years) - Set up strict guidelines for data collection
– Train observers or interviewers to obtain data in the
same fashion
– It is preferable to use more than one observer or
interviewer, but not so many that they cannot be
trained in an identical manner
– Optimize questionnaire
How does information bias affect study results?
- Fundamental principle of research:
If you want to investigate any association between two
factors, first make sure you measure these two factors
accurately! - Information bias can be introduced in the assessment of
both the main exposure and the main outcome, thus the
association between them will definitely be distorted - Information bias arising from participant actions is much
more common compared to information bias arising
from researcher actions - Information bias affects mainly studies that rely on self-reports (i.e. questionnaire-based data collection)
– In outcome assessment (measurement), in studies where self-reported disease status is used, there is usually double-checking
(confirmation) with the personal GP of the participant or
through medical records
– Similarly, while assessing exposures (diet, physical activity,
smoking, educational attainment, etc.), the most valid and
reliable instruments have to be used - If a study relies solely on self-reports, then it should be
assumed that information bias (measurement error) is
operating to some extent - The presence of information bias always compromises
the validity of the study results and in such a case,
findings have to be interpreted with great caution
What should all assessment tools have?
- Validity
- Reliability
What is validity?
The extent to which an assessment tool (e.g.
questionnaire, instrument, etc.) measures accurately what it is
intended to measure
What is criterion validity?
Criterion validity is the most common type of validity used in
medical research. In such a case, the results from the
assessment tool of interest are compared with those of an
established (known as gold standard) assessment tool
What is reliability?
The overall consistency of a measure, as regards
producing the same results when administered under the
same conditions in the same group of people. Also known as
reproducibility or repeatability
What are the two main types of reliability?
- Inter-observer reliability: The degree of agreement between
the results when two or more researchers (observers)
administer the assessment tool on the same people under the
same conditions - Intra-observer reliability: Describes the agreement between
results when the assessment tool is used by the same
researcher (observer) on two or more occasions (under the
same conditions and in the same test population)
What is internally valid?
If a determination is made that the findings of a study were
not due to any one of these three sources of error, then the
study is considered internally valid.
In other words, the conclusions reached are likely to be
correct for the circumstances of that particular study.
What is external validity?
This does not necessarily mean that the findings can be
generalized to other circumstances (external validity)
NB!
DO NOT COMPROMISE INTERNAL VALIDITY IN THE GOAL OF GENERALISATION