Exam 2 Flashcards
Experimental vs. observational
Experimental= population based with participants randomly chosen and randomly assigned to either exposure or non-exposure Observational= population based with participants not randomly put in exposed or unexposed (they chose to or not to participate in the risk factor.
Definition of a cohort
Any designated group of persons who are followed over a period of time
Cohort study
- Determine exposed/non-exposed groups first, then follow them to determine risk.
- start with people free of disease
- usually applied to rare exposures
- most rigorous observational study
- also known as prospective, longitudinal, incidence, or follow-up studies.
Strengths of cohort
- good for rare exposures
- can evaluate multiple effects of exposure
- clear temporal relationship between exposure and disease
Weaknesses of cohort
- loss to follow up
- requires a large sample
- costly
Framingham study
- large, longitudinal cohort study in 1948
- investigated risk factors for CVD (first use of “risk factor”
- 1948- first cohort
- 1971- second generation cohort(orig. participants’ children and spouses)
- 1994- first Omni cohort
- 2002- third generation cohort
- 2003- second Omni cohort
Framingham study risk factors
- high blood pressure and cholesterol
- diabetes
- obesity and physical inactivity
- smoking
- blood triglyceride and HDL cholesterol levels
- age
- sex
- psychological issues
Open or dynamic cohort
Members come and go and eligibility changes over time
Defined by changeable characteristics like smoking or residence in certain area
Fixed cohort
Membership defined at outset and no new members are gained later. Loss to follow up may occur
Defined by an irrevocable event like serving in the military or surviving natural disaster.
Defining cohorts
- if population is selected before exposures are known they may investigate more than one exposure or outcome and it is a single-sample or population based study
- if exposed groups are selected at the start based on their exposure then they are likely focusing on one exposure (common for occupational studies)
Types of exposed populations
Occupational- like miners, nurses, etc.
clinical- groups undergoing a particular medical treatment
Lifestyle factors or conditions
Veterans of specific wars
Geographically defined areas
Time period defined by exposure experience
Determining exposure status
Exposed- should be disease free but susceptible to disease, representative of exposed population, defined carefully
Unexposed- should be disease free but susceptible to disease, representative of the general population, and from the same underlying population as exposed group
How exposure data is collected for cohorts
-pre existing records (less expensive but less detail)
- questionnaires, interviews (good info but possible recall bias)
Direct testing (physical exams, environmental sampling, etc. )
-biological sample banks
How outcome data is collected for cohorts
- active participation- medical exams, lab tests, questionnaires
- no active participation of cohort members- death certificates, disease registries, medical records
Internal comparison
Exposed vs. unexposed members of the same cohort
External comparison
Exposed members of one cohort vs. unexposed cohort from another study
General population comparison
Members of cohort vs. general population
Common in occupational studies
Healthy worker effect- those that are employed are often healthier than unemployed.
2x2 table
Disease(a). No disease(b)
Exposed (c)
Not exposed (d)
Risk difference
[a/(a+b)] - [c/(c+d)] also known as attributable risk
Relative risk
[a/(a+b)]/[c/(c+d)]
First part= risk among exposed
Second part= risk among non- exposed
Interpreting relative risk
<1 = risk of disease for exposed is less than the risk of disease for non exposed (protective)
=1 means that the risk of disease is equal among exposed and non exposed (null value)
> 1 = risk of disease for exposed is greater than risk of disease for non exposed.
95% CI interpretation
If it includes 1.0, it is not a statistically different association
If it doesn’t include 1.0, it is a statistically significant association
Case control study
A type of observational analytic study in which subjects are selected on the basis of whether they do (cases) or do not (controls) have a particular disease under study
Also known as retrospective because previous exposure is assessed after identifying cases and controls.
Strengths of case control studies
- good for rare diseases/outcomes
- can evaluate multiple risk factors for the same disease/ outcome
Weaknesses of case control studies
- vulnerable to recall bias
- may have poor exposure information
- difficult to infer temporal relationship
Finding cases for case controls
- use incident cases instead of prevalence cases
- patients ( clinics, hospitals)
- disease registries (cancer, birth defects, trauma)
- surveys
- death certificates
Finding controls for case control studies
- Patients at clinics/hospitals- illnesses among controls must be unrelated to exposure, have same referral pattern to facility
- Population- tax lists, voter registration, drivers license rosters, telephone directories, etc.
- friends, spouses, relatives- likely to share socioeconomic status, race, etc, but cases reluctant to nominate, may share same characteristics.
- deceased controls.
How exposure data is generated for case controls
Same as for cohorts
Probability
The fraction of times you expect to see the event in many trials (range between 0-100%) because they are proportions
Odds
The probability that an event will occur divided by the probability that it will not occur (any positive number because they are ratios)
Odds ratio
The ratio of odds of exposure among cases to odds of exposure among controls
AD/ BC
Provides an estimate of relative risk assuming that:
- Cases and controls are representative of diseased and general populations, respectively
- disease is rare(<10% incidence among unexposed).
Interpreting the odds ratio
<1 means that the odds of exposure for cases are less than the odds of exposure for controls
=1 means that the odds of exposure for cases are equal among cases and controls (null value)
>1 means that the odds of exposure for cases are greater than odds of exposure for controls
OR is sometimes called the point estimate (will always be included in 95% CI)
Matched case control studies
- purpose is to maximize similarity between cases and controls with respect to factors other than the exposures of interest
- helps to avoid or decrease confounding
- Most common to match on age, gender, race
- frequency matching- does not match individual cases and controls, only matches distribution of characteristics among cases and controls
- Individual matching- each case is matched to a control with similar characteristics.
Disadvantages: sometimes difficult to find controls who fulfill criteria
- cannot “unmatch”
- cannot study the matching factors
Concordant pairs (case control studies)
Both case and control are exposed
Or
Neither case nor control is exposed
Discordant pairs (case control studies)
- exposed case, unexposed control
- unexposed case, exposed control
Hierarchy of evidence
Systemic review Experimental Cohort Case control Cross sectional Case report/case series Ecological (population)
Cross sectional study
A study that examines the relationship between diseases and exposures as they exist at one particular point in time
- like a snapshot of disease and exposure
- exposure, outcome determined simultaneously
- also known as prevalence studies because they measure prevalence instead of incidence
Cross sectional possible groups
- exposed, have disease
- exposed, do not have disease
- not exposed, have disease
- not exposed, do not have disease
Strengths of cross sectional studies
- can often be completed with existing dats, so relatively quick and inexpensive
- can adjust for effects of other variables (confounding)
- exposure usually current , easier for subjects to recall
Weaknesses of cross sectional studies
- cannot establish temporal relationship between exposure and disease (which came first?)
- prevalent cases are not necessarily representative of incident cases (those who survive longer more likely to be in study, might be healthier)
- selection bias- some people might be more likely to participate based on their exposure or disease status.
Prevalence ratio
An estimate of relative risk
[a/(a+b)]/[c/(c+d)]
Prevalence difference
[a/(a+b)]-[c/(c+d)]
The difference in prevalence between the exposed and unexposed
Sources of data for cross sectional studies
- large national surveys (national health interview study NHIS, national health and nutrition examination survey NHANES, national hospital discharge survey… replaced by national hospital care survey, national nursing home survey, behavioral risk factor surveillance system BRFSS)
- existing studies
Case report
- simplest type of individual- level design
- more often associated with medical vs. epidemiological studies
- description of single individuals disease (personal characteristics, symptoms, diagnosis, treatment, outcome)
- may refrrence other reports, highlighting similarities and differences
Case series
A survey of a group of individuals with a particular disease
-like a case report, is only descriptive
Strengths of case reports and case series
- Important for describing new diseases and conditions and for adverse effects of drugs or other therapies
- can address several aspects of an individual’s medical history in detail
- good for studying mechanisms of disease
- also good for hypothesis generation
Weaknesses of case reports and case series
- essentially anecdotal
- unremarkable cases unlikely to be reported (publication bias)
- highly susceptible to selection bias
Ecology
The study of the relationships among living organisms and their environment
-human ecology= study of human groups as influenced by environmental factors including social, behavioral, etc.
Ecological study
A study in which the units of analysis are populations or groups, rather than individuals
- geography based analytic units - investigate their association between exposure level and disease are among populations of countries, states, counties, census tracts, etc.
- temporal trend or time series studies- investigate changes in the exposure level and disease rate in one or more populations over time
Ecological analysis
Analysis based on aggregated or grouped data
Sources of data for ecological studies
-vital statistics
- disease registries
- hospital admissions
- department of motor vehicles
- U.S census
-utility companies
- environmental data (weather reports)
-government policies and laws
- large government surveys
ALL DATA SHOULD BE POPULATION BASED
Strengths of ecological studies
- good for hypothesis generation (which questions should we be asking?)
- especially good for rare diseases since analytic units are populations
- good for evaluation of population- wide changes or differences, perhaps related to policies, laws
- inexpensive and quick since they rely on existing data
Weaknesses
Ecological fallacy- inappropriate conclusions regarding g relationships at the individual level, based on observations at the group level
- strong associations between two factors in group level data are not always good evidence of a casual link at the individual level.
- very difficult to control for confounding
- joint distribution of exposure and disease unknown at individual level.
Confounding
A situation in which a measure of the effect of an exposure on risk is distorted because of the association of exposure with other factors
Descriptive statistics
Summarize a sample selected from a population
Questions about a sample:
-what is the average value in the sample
- what is the most common value in the sample
- what is the range of values in the sample
- how common is an event or characteristic in the sample
Inferential statistics
Make inferences about population parameters based on sample statistics
- inference- a conclusion reached on the basis of evidence and reasoning
- this boils down to asking questions about a population based on a sample
Normal distribution
Used as a model of continuous data (ex blood pressure, height, weight, etc.)
- symmetric about the mean (mean=median=mode)
- described by mean and standard deviation
Mode
Most common value
Two broad types of statistical inference
Estimation- what is the average value in the population, based on the sample
Hypothesis testing-is the average value in the population above a certain value? Is it different than the corresponding values in another population?
For both of these it is assumed that the sample drawn from the population is random (representative of the population)
Point estimate
The best single value estimate of the parameter
With confidence interval it is :
Point estimate plus or minus the margin of error
Hypothesis testing
Research hypothesis is generated about an unknown population parameter
- null hypothesis
- res3rch/ alternative hypothesis
Sample data are analyzed and determined to support or refute the research hypothesis
5 steps of hypothesis testing
5 steps:
1. Determine the null and research hypothesis
Null hypothesis= no change
Research hypothesis= what investigator believes child be true
2. Select test statistic
3. Set up decision rule (p-value)
4. Compute test statistic
5. Draw conclusion and summarize findings
Continuous data
Data represent measurable quantities, not restricted to specified values. Equal differences between scale values do have equal quantitative meaning (age, temp, blood pressure)
Categorical data
- unordered categories (race, state, county, occupation, etc. )
- two categories (disease/no disease, exposed/unexposed) is BINARY, DICHOTOMOUS
Ordinal
Order is important
Ex) symptom sevarity(1=fatal, 2=severe, 3=moderate, 4=minor) stages of cancer
Two sample t-test (student’s t-test)
-compares mean values of some quantity between two UNRELATED groups (categories of people) to determine if they are likely the same
Assumptions:
- independent observations- no measurements on the same subject
- normally distributed data for each group
- equal variances for each group- this is usually safe to ignore if sample sizes are equal and relatively large
P-value
The probability that the observed result would occur assuming the null hypothesis is true
<0.05 is SIGNIFICANT —-> reject null hypothesis because there is a difference between the means of the two groups
> 0.05 is NOT SIGNIFICANT —-> accept null hypothesis because there is no difference between the means of the two groups
Matched pair t-test
Used for repeat observations on THE SAME PEOPLE or for observations that are otherwise paired
Analysis of variance (ANOVA)
- statistical technique that examines the variation in values within and between groups to determine if they are the same
- equivalent to t-text with only two groups
- most commonly used when groups ARE 3+
Null hypothesis: all groups are drawn from the same population with the same mean value
Alternative hypothesis: at least one group comes from a population with a different mean value
Assumptions for ANOVA and students t-test
- independent observations
- normal distribution
- variances of populations are approximately equal
Correlation/ simple regression
Used for comparing two continuous variables
X= independent (or predictor variable, exposure) Y= dependent (or outcome variable, disease)
Can also be used to compare rates at the ecological level
Ex) country level smoking rates and county level Athena rates
Correlation
Nature and strength of linear association between variables
(Linear) regression
Equation that best describes the relationship between variables
Correlation coefficient
- always between -1 and 1
- sign indicates nature of the relationship (negative= inverse, positive=direct)
- magnitude indicates strength of association
0. 8 or more (or 0.8 or less) is generally considered highly correlated
Coefficient of determination
Expresses the proportion if variance in one variable (Y variable, outcome variable) that is explained by another variable (X variable, exposure variable)
Chi squared test
Used to examine the relationship between two categorical variables
- for categorical data, proportions or frequencies rather than means
Null hypothesis: no association- observed frequencies =expected frequencies
Alternative hypothesis: association- observed frequencies not equal to expected frequencies
- non parametric test- makes no assumptions about normal distribution of data because it is not continuous
- cell sizes should be greater than 5
Causation
The act of causing something
Causality
The relationship between the cause and effect (refers to a truly meaningful relationship)
Correlation or association
A statistical dependence between two or more events, characteristics or other variables
- correlation or association are merely observations of coincidences that may or may not be related
Apparent association
“Eyeballing” data, maybe no statistical association
Statistical association
Correlation, but not necessarily causation
Causal association
- also statistical association
- direct association (straight from factor to disease)
- indirect association (steps between the factor and disease)
Sequence of studies
Clinical observations —> available data —> case control studies —> cohort studies —>randomized trials
Koch’s postulates
- the organism is always found with the disease
- the organism is not found with any other disease
- the organism, when isolated from one who has the disease and cultured through several generations, produces the disease in a new host
Work better for infectious diseases than chronic diseases because chronic diseases have multiple causes, so causation is more complex
Hill’s criteria for causality
~Strength- the stronger the relationship between the independent and dependent variable, the less likely it is due to chance.
~consistency- multiple observations of association by different people under different circumstances increase belief that the association is real.
~specificity- showing an outcome is predicted by one primary factor ads credibility to causal claim. ~temporality- the cause does precede the effect.
~biological gradient (dose-response)- there should be a direct relationship between the degree of risk and probability of outcome (greater exposure=greater disease).
~plausibility- a rational, theoretical basis for association between the risk factor and the disease.
~coherence- the association does not conflict with other knowledge about the risk and disease.
~experiment- research based on experiments makes the causal inference more believable.
~analogy- using a commonly accepted association in one area to another similar area.
Cause of disease
Not a single component, but a minimal set of conditions or events that produces an outcome
(Pie is whole cause and slice is component cause)
Could be a number of sufficient causes for one disease ( multiple pies)
Sufficient cause
A complete causal mechanism that inevitably produces a disease
If one of the individual component factors is missing, disease does not occur
Necessary cause
A required component cause for a disease to occur
Present in ALL sufficient causes
Ex Ebola doesn’t occur without Ebola virus