Intro to Biostatistics Flashcards
Types of Study Design
Descriptive Study: Description of what is happening in a population.
Analytic Study: Quantification of the relationship between two factors (i.e., effect of intervention on an outcome).
Experimental Study: Manipulation of the exposure via randomization to intervention or exposure.
Observational Study: Measurement of exposure to a matched group
Types of Observational vs. Experimental Studies
Observational: Ecological Cross-sectional Cohort Case-control
Experimental:
Randomized control trial
Community trial
Ecological Study
Units of analysis are populations or groups of people and not individuals.
Focuses on the comparison of groups rather than individuals
Ecological Study: Advantages
Low cost
Convenient
Not all measurements can be made on individuals
Ecologic effects are main interest (at the population level)
Simplicity of analyses and presentation
Hypotheses generating for future research
Ecological Study: Disadvantages
Prone to “ecological fallacy”:
Assumptions that relationships observed for groups hold true for individuals.
Such inferences made using group-level data may not always be correct at the individual level.
Cannot adjust for confounds due to lack of comparability (due to lack of data on all potential covariates)
A covariate is a secondary variable that can affect the relationship between the dependent variable and other independent variables of primary interest.
Missing data
Cross-Sectional Study
Surveys exposures and disease status at a single point in time (a cross-section of the population)
Measures prevalence, not incidence of disease
Suitable for studying conditions that are relatively frequent with long duration of expression (nonfatal, chronic conditions)
Not suitable for studying rare or highly fatal diseases or a disease with short duration of expression
Example: community surveys
Incidence vs Prevalence
Incidence: rate of new cases
Prevalence: actual number of cases alive at one point in time
Cross-Sectional Study: Advantages
Low cost
Convenient
Less time-consuming than other designs
Allows study of several diseases/exposures
Provides estimates for population burden, health planning and priority setting of health problems
Cross-Sectional Study: Disadvantages
Weaker design because it measures prevalence, not incidence of disease. (Prevalent cases are survivors).
Temporal sequence of exposure and effect is difficult to determine.
Difficult to determine when disease occurred.
Rare diseases and quickly emerging diseases are difficult to study.
Cohort Study
One or more cohorts (i.e., samples) are followed prospectively.
Prospective studies follow a condition, concern or disease into the future to determine which risk factors are associated with it
Following and measuring things from people over time for certain conditions, concerns, or diseases to determine risk factors
Cohort Study: Advantages
Exposure status determined before disease detection.
Study subjects selected before disease detection.
Study subjects can be matched to help control for confounding variables.
Ability to study several outcomes for each exposure
Cohort Study: Disadvantages
Expensive
Time-consuming
Not suitable for rare diseases or diseases with long latency
No randomization (subject characteristics imbalances in patient characteristics could exist
Loss to follow-up
Case-Control Study
Compares exposures in disease cases versus healthy controls from same population
At one point in time, but looking back (retrospective)
Case-Control Study: Advantages
Low cost
Less time-consuming than other designs
Most feasible design for disease outcomes that are rare
Case-Control Study: Disadvantages
Not a suitable design when disease outcome for a specific exposure is not known at start of study.
Exposure measurements taken after disease occurrence (retrospective data).
Disease status can influence selection of study subjects
Randomized Controlled Trials (RCTs)
Experimental comparison study where participants are randomized to experimental or control groups.
Best for studying the effect of an treatment/test.
Gold standard for epidemiological research
Randomized Control Trials (RCTs)
Primary purpose
Reduces selection bias in the allocation of intervention.
Each participant has an equal chance of being in experimental or control group.
Secondary purpose
If large sample size, the experimental and control groups should have similar baseline characteristics.
Helps to control for known and unknown factors.
Advantages of RCTs
Randomization balances distribution of confounders.
Blinding of participants and researchers reduces bias in assessment of outcomes.
Detailed information collected at baseline and follow-up periods.
Populations of participating individuals are clearly identified
Results can be analyzed with well-known statistical tools
Disadvantages of RCTs
Expensive and time-consuming
Volunteer bias
Large sample size may be required
Participant exclusion may limit generalizability
Adherence may be an issue
Sponsor or funding source may be an issue
Ethical concerns
Community Trial
Experimental studies with whole communities (e.g., cities, states) as experimental units.
The intervention is assigned to all members in each of a number of communities.
Community trials follow the same procedures as RCTs (eligibility criteria, informed consent, randomization, follow-up measures).
Blinding and double blinding are not generally used in community trials
Community Trial Advantages
Randomization balances distribution of confounders.
Detailed information collected at baseline and follow-up periods.
Results can be analyzed with well-known statistical tools.
Directly estimate the impact of change in behavior or modifiable exposure on the incidence of disease.
Community Trial Disadvantages
Expensive, time-consuming
Difficulty controlling study entrance study, intervention delivery, and monitoring of outcomes.
Fewer study units are capable of being randomized, which affects comparability.
Affected by population dynamics, secular trends, and nonintervention influences.
Systematic and Random Error
Errors can be systematic (differential ) or random (non-differential)
Systematic error: Use of an invalid outcome measure that is consistently wrong in a particular direction (e.g., faulty measuring instrument)
Random error: Use of an invalid outcome measure that has no apparent connection to any other measurement or variable, generally regarded as due to chance
Use of the term “Bias” should be reserved for systematic (differential) error
Selection vs. Detection Bias
Selection bias: systematic error in the ascertainment of study subjects; not random
Can lead to systematic differences between baseline characteristics of the groups that are compared.
Detection bias: systematic differences between groups in how outcomes are determined.
A potential artifact caused by use of a particular diagnostic technique or type of equipment.
Confounding and Membership Bias
Confounding bias: a third factor that is related to both exposure and outcome accounts for some/all of the observed relationship.
Membership bias: individuals who belong to an organized group (e.g. military, religious group) tend to differ systematically with regards to health from the general population.
Members of an organized tend to be healthier and less prone to morbidity and premature mortality.
Recall and Instrument Bias
Recall bias: remembering past exposure error differs by time or between cases and controls.
Instrument bias: this occurs when the measuring instrument is not properly calibrated.(e.g., A scale may be biased to give a higher reading than actual, or lower than actual).
Attrition, Social Desirability, and Lead Time Biases
Attrition bias: systematic differences between groups in withdrawals from a study.
Social desirability bias: tendency to respond to personally or socially sensitive questions in a socially acceptable direction.
Lead time bias: time by which a diagnosis can be advanced by screening
In estimating survival time, acknowledge the point when early diagnosis is made versus usual diagnosis in order to control for lead time bias.
Types of Data
Nominal data: Numbers used to categorize data.
e.g., Gender, race, marital status, etc.
Ordinal data: Numbers used to order or rank data.
e.g., “Is your health poor, reasonable, good, or excellent?
Interval data: Numbers used to order data by equal intervals.
e.g., Time of day, temperature
Ratio data: Numbers that can be compared to an absolute zero (i.e., a point where none of the variable being measured exists).
e.g., Height, weight, age, income
Central Tendency
Where the center of the distribution tends to be located
Three measures of central tendency
Mode
Median
Mean
Which one you report is related to the scale of measurement and the shape of the distribution
Mode
The most frequently occurring score
Look at the simple frequency of each score
Unimodal or bimodal
Report mode when using nominal scale, the most frequently occurring category
If you have a rectangular distribution do not report the mode
Median
Score at the 50th percentile
If normal distribution the median is the same as the mode and mean
Arrange scores from lowest to highest, if odd number of scores the median is the one in the middle, if even number of scores then average the two scores in the middle
Used when have ordinal scale and when the distribution is skewed
Mean
Score at the exact mathematical center of distribution (average)
Used with interval and ratio scales, and when have a symmetrical and unimodal distribution
Not accurate when distribution is skewed because it is pulled towards the tail
Measures of Variability
Extent to which the scores differ from each other or how spread out the scores are
Tells us how accurately the measure of central tendency describes the distribution
Shape of the distribution
Types: range variance, standard deviation
Range
Can report the lowest and highest value
Or report the maximum difference between the lowest and highest
Semi-interquartile range used with the median: one half the distance between the scores at the 25th and 75th percentile
Variance
Statistical variance gives a measure of how the data distributes itself about the mean or expected value.
If individual observations vary greatly from the group mean, the variance is big; and vice versa.
Unlike range that only looks at the extremes, the variance looks at all the data points and then determines their distribution.
Standard Deviation
Standard Deviation is a measure of variability of scores in a particular sample
σ = is the population standard deviation.
s = the sample standard deviation (sq root of the variance)
Variance = s2
Null Hypothesis
The null hypothesis (H0) is an essential part of any research design and is always tested.
It reflects that there will be no observed effect for the experiment.
The null hypothesis (H0) is a hypothesis which the researcher tries to reject.
Alternative Hypothesis
The alternative or experimental hypothesis (HA or H1) reflects that there will be an observed effect for the experiment
Type I Error
Type I error
Rejecting the null when it is true
“False positive”
Type 1 errors can be controlled
Alpha is the maximum probability that we have a type I error.
It is related to the level of significance selected.
For a 95% confidence level, alpha is 0.05.
There is a 5% probability that we will reject the true null hypothesis
Type II Error
Type II Error
NOT rejecting the null when it is wrong
“False negative”
The probability of a type II error is denoted by beta.
This number is related to the power or sensitivity of the hypothesis test, denoted by 1 – β