Midterm Deck: Sessions 1-7 Flashcards

1
Q

What is considered the purview of environmental epi?

A

Anything external to your skin about which you have no autonomous choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pros to dietary surveys (6)

A
  • ecological overview of consumption vs. crop growth
  • easy to do
  • cheap to do
  • individual level data
  • culturally appropriate
  • Can play with info over the life course
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cons to dietary surveys (3)

A
  • Recall bias (cases and controls remember exposure differently)
  • Measurement error
  • Social desirability bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pros to crop sampling (4)

A
  • Valid lab data
  • Geographical breadth
  • Cheap
  • Easy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cons to crop sampling (5)

A
  • Lab Analyses can be expensive
  • Could have lab error
  • Timing Issues
  • Can be ecologic and not specific
  • Doesn’t take into account food prep or storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pros to Biomarkers (3)

A
  • Know bioeffective does
  • Gold standard for exposure
  • Good quantitative data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cons to Biomarkers (8)

A
  • Expensive
  • Complicated (logistically)
  • Ethically questionable
  • Invasive (may lead to participant selection bias)
  • Timing may affect sampling (level of biomarker may shift in sample over time)
  • Measurement Error
  • Batch Effects/Freeze/Thaw bias
  • Construct problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What were the three primary exposure sampling methods used to deconstruct the afflatoxin story?

A
  • food intake/dietary survey
  • Biomarker Sampling
  • Crop Sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does afflatoxin cause cancer?

A

o Afflatoxin causes cancer by covalently binding to Guanine in DNA as it turns a nonwater soluble compound into a water soluble compound. → afflatoxin dna adduct → so afflatoxin causes cancer bc it damages dna and causes a mutation.
• Gene P53 is mutated in ½ of all cancer patients; it’s a key gene in cell replicatin. If P53 is damaged, it’s a key step in carcinogenesis
• Mutations in P53 and Mutations in guanine in codon 249 were often linked to afflatoxin contamination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Construct Validity Problem - what is it, and where do we see it?

A

Construct validity is “the degree to which a test measures what it claims, or purports, to be measuring.”

So, for biomarker tests, you have to be careful that your test is actually measuring what you think it measures…ie. afflatoxin presence in urine may mean that u metabolize it better…not that you’ve eaten more of it… so you have to be careful about what your measurement is actually telling you in environmental epi.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Latency Period

A

Def: period of time between onset and diagnosis
o Can involve years or decades → must often assess historical exposures
o Exposures that occur during the latency period may not be relevant → is important to know the natural progression of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name 4 nuances of exposure assessment

A

1) Latency
2) Incomplete Data
3) Exposure Metric
4) Interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe “Nuances of exposure assessment” # 1:

A

Latency: is the period of time between onset and diagnosis
• Can involve years or decades → must often assess historical exposures
• Exposures that occur during the latency period may not be relevant → is important to know the natural progression of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe “Nuances of exposure assessment” #2:

A

Incomplete Data

often individual level data is not available, must often work with indexes or exposure scales (job title; proximity to a source)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe “nuances of exposure assessment” #3

A

Exposure Metric Variation:

mean vs. cumulative (i.e. cig pack years over time) vs. peak (acute, when damage only happens above a certain threshold) vs. lagged (exposure today doesn’t impact disease risk tomorrow, it impacts disease risk years form now) exposure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe “nuances of exposure assessment” #4:

A

Interactions

Failure to consider interactions can hide relationships…as interactions can have effect modification on outcomes (ie a given level of exposure has a different effect if you do/do not have another issue). Interactions can take place amongst genetic susceptibility, age, and concurrent disease, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is it key to take interactions into account when assessing exposure?

A

This is key because protections should be set up to help the most vulnerable (i.e. those impacted by interactions), not the average person (so if old people are impacted by an exposure more than young, that should be taken into account)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

8 ways to assess exposure amounts

A
o	1) Environmental Monitoring
o	2) Environmental Modeling
o	3) Questionnaires and Job Records
o	4) Biomarkers
o	6) The Exposome
o	7) Complex Mixtures
o	8) Enviroment Wide Association Studies (EWAS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Environmental Monitoring Represents & Requires…

A

• Represents exposure level but not dose absorbed by the individual
• Monitoring is expensive, and requires expert assistance in set-up, quality control, and analyses
* Must make sure the monitoring system is aesthetically and culturally appropriate for the study population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ambient Air monitoring is usually… ?

A

• Ambient is usually for a zone or room while micro-environments might be more important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Personal air monitoring can place ….?

A

• Personal monitoring can place a high burden on participants and is usually only a snap shot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Occupational routine air monitoring nuances

A

• In Occupational settings routine monitoring is often non-random & is only in “trouble spots” or “high” areas (so you don’t have a good idea of true exposure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

General Air monitoring stations can…?

A

• Air monitoring stations can provide ecological data but is often not sited randomly (or placed in the most useful spots)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

4 Types of Environmental Modeling

A

Dispersion Models
Interpolation Models
Land Use Regression
Kriging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a dispersion model?

A

model the movement of a pollutant in a media (air, ground water) based upon the physical properties of the pollutant and the media
• Model derived estimates can be used to replace actual sampling
• Usually used when there is a point source for exposures (i.e. how might a contaminant move through the water supply?)
• Often yields ecological type data for a given area such as a zip code or a census tract
• Dependent on the assumptions of the dispersion model
• Models need to be validated
• Can reduce the number of required samples but still require samples (to validate and prove your model works)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an interpolation model?

A

Estimate exposures at non-sampled locations of interest based upon available environmental samples (model exposure at an intermediate point between two points)
• Often used for estimating urban air pollution at given locations of interest (i.e. study subjects home)
• Dependent on the assumptions in the model
• Can use available monitoring data or structured planned monitoring across a target area (city or region)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is land use regression?

A

uses land use data such as highway proximity, pollution point sources, commercial land area, green space and tree cover to determine pollution levels at given points of interest
• The regression model is built and validated using known pollution levels from environmental sampling sites (have monitors set up to establish that the model works)
• The regression is then applied to estimate the pollution levels at non-sampled sites
• NYC has an annual air pollution monitoring program and has built LUR models for the entire city (NYCASS program)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is kriging?

A
  • Uses the special auto-correlation of data from sampled points to estimate the data values at non-sampled points.
  • Have used google street view to visualize systematically sampled blocks of cities, and have measured the physical disorder on each block. Then used kriging to estimate physical disorder at any location in the city (urban decay is correlated, so use that spatial correlation to determine a given point based upon its surrounding points’ results).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Nuances of using Questionnaires to determine exposure

A
  • Can suffer from recall bias and foggy memory
  • Often yield exposure indexes or scales, but rarely actual exposure levels (high, medium, low)
  • Is also a challenge to measure things people don’t see – i.e. stealth exposures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Nuances of using job records to determine exposure

A
  • Can provide long term exposure info and when combined with monitoring data can provide measures of cumulative exposure→lead to job-exposure matrixes in occupational studies
  • Job-exposure matrixes take a lot of time and effort to make
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a biomarker?

A

an analytical target measured in a biological media. Allows for quantitative data out of biological assays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Biomarker Pros (3)

A
  • Can provide information on internal dose and biologically effective dose
  • Not subject to recall bias, and if well done, not subject to information bias at all
  • Storage repositories allow for great efficiency and multiple studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Biomarker Cons (7)

A
  • Missing samples can cause selection bias (i.e. did people decline participation due to not wanting to give a sample, and do those people differ from those who agree to give a sample?
  • Few biomarkers can provide information on historical exposure
  • Half life is important to consider and can range from 10 years to two days
  • May be adversely affected by disease (case-control and cross-sectional studies); Must be careful of causal chain – is the biomarker impacting the disease or is the disease impacting the biomarker?
  • May be adversely affected by long-term storage (nested case-control studies)
  • Must be very careful in collection, handling, and storage – consistency is key, and must be able to keep samples viable
  • Must be very careful in analysis to make sure batch effects don’t bias data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the Exposome?

A
  • A proposal by Christopher Wild to develop technology to measure all environmental exposures that a person encounters across the life span (if a genome is all your genes, the exposome is all your exposures)
  • A reaction to the growth of genomic and other ‘omic technologies at the expense of understanding the role of environmental exposures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What might exposome technologies include?

A

Its not clear, but maybe includes:
• Metabol-omics: measurement of all metabolites of all the chemicals a person was exposed to (all chemicals leave a metabolic trace in body)
• Expression-omics – measures of changes in protein expression as signatures of responses to exposures
• Adduct-omics – measurement of all species of DNA or protein adducts present in a biological sample (i.e. afflatoxin adducts)
• Life-logging and passive monitoring technologies (gps, passive enfornmental units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Exposome Critique

A

many proposed technologies focus on the internal dose, biologically effective dose, or response to a dose and are conceptually too far removed from actual exposure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Nuances of Complex Mixtures Studies

A
  • Few environmental exposures occur in isolation → exposed to complex mixtures of chemicals where exposure levels across mixtures are correlated
  • One exposure at a time regression isn’t sufficient, but entering multiple exposure variables can lead to issues of multi-collinearity
  • How to analyze complex mixtures is a growing field; trying LASSO, variable selection, and machine learning to identify “bad actors” in mixtures.
  • Use cluster and principal component analysis to identify salient patterns of exposure/see if they impact outcomes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is an Enviornment Wide Association Study?

A
  • Patel et al
  • Used NHANES data
  • Looked at 266 environmental factors measured in blood/urine
  • Ran a separate logistic regression model for each factor as a predictor of Type II diabetes status; included nutritional factors in analyses as environmental factors
  • Any time a chemical was associated with type 2 diabetes in 2+ NHANES cohorts, they agreed it was validated and an exposure that mattered.
  • Found 2 chemicals associated with type II dabetes by doing hypothesis-free scope evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Cohort Studies - what are?

A

o Cohort Studies: find a bunch of people with an exposure, and follow them forward in time to see if disease occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Cohort Studies Strengths

A
  • Easier to confirm exposure as being prior to disease (temporality is maintained)
  • No selection bias
  • Can focus on rare exposures such as in occupational cohorts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Cohort Studies Weaknesses

A
  • Not efficient for rare diseases (and many environmentally related diseases are rare)
  • Must collect exposure data on many subjects who remain healthy, and have massive data collection processes
  • Biomarker Studies can require vast resources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

5 nuances of occupational cohorts

A

o Goes to where the exposures are – occupational exposures often have exposure levels that are orders of magnitude higher than ambient exposures
o Often there are industrial hygiene records that provide some data on exposure (database on site)
o Exposure assessment can be time consuming, and job/exposure matrixes require larege amounts of data
o External control group uses expected rates in the general population as a comparison; healthy worker effect can be an issue
o Internal control group compares the lowest exposure group to highest exposure group; often no non-exposed group on site and aspects of the healthy worker effect can still operate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Case-Control Strengths

A
  • Efficient for studying rare disease
  • Potential for more in-depth exposure assessment
  • Fast to conduct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Purpose of Controls

A

• Purpose of controls: to estimate the prevalence of exposure in the source population that generated the cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Control Selection

A
  • Don’t need to match on everything – just need to match on referral patterns
  • Need people who represent the underlying population of interest… i.e. “if they had the disease, would they be in my study?”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Case-Control Weaknesses

A
  • Control Selection is challenging (make sure you agree with assessment that it is a case control study – i.e. ensure u think control group is good!)
  • Not sufficient for studying rare exposures
  • Exposure assessment is retrospective and often questionnaire based
  • Few biomarkers are able to reflect historical exposures (hard to prove temporality)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

o Case-Case or Case Series Studies Strengths:

A

o No controls
o More precise estimates of gene-environment interactions
o Efficient for studying rare diseases
o Potential for more in-depth exposure assessment
o Relatively fast to conduct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Case-Case or Case Series Studies Weaknesses

A

o Requires more detailed hypotheses
o Requires some means to define two case groups
o Odds ratios are more difficult to interpret than those generated from a case-control study
o Not efficient for studying rare exposures
o Exposure assessment is retrospective and usually questionnaire based
o Few biomarkers able to reflect historical exposures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Nested Case-control studies

A

o Optimizes strengths/alleviates weaknesses of case-control and cohort studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Cross Sectional Study Strengths

A
  • Less time consuming and costly

* Can be used to screen many exposures and diseases at same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Cross Sectional study weakness

A

causal pathway is less clear bc temporality can’t be established

despite causal limitations our understanding of one of the most important disease-environment relationships came out of this (lead/developmental effects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Ecological Study: Multi-Group/Cross Sectional Ecological Studies Strengths

A

• Strengths
o Easy access to large amounts of aggregate data
o Cheap
o Fast
o Can identify areas for more intensive investigation and generate hypotheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Ecological Study: Multi-Group/Cross Sectional Ecological Studies Weaknesses

A

o Ecological bias
o Difficulty in controlling for confounding
o Cheap cost and speed can lead to issues of multiple comparisons
o Despite weaknesses ecological air studies provided data that influenced current regulations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Ecological Study: Multi-group/cross-sectional studies look at how

A

– how does rate of disease in area compare with exposure in an area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Ecological Study: Time trend studies look at how

A

how does exposure variation impact health variation over time? i.e. exposure day 1 impact disease outcome day 2?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Ecological Study: Time trend strengths

A

o Can make use of readily available data sets
o Each subject or group uses itself at another time as control
o Good for acute effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Ecological Study: Time trend weaknesses

A

o Many trends co-vary (such as pollution levels and weather conditions)
o Not good for diseases with long latency periods
o Ecological Bias
o Cyclic things clan blur results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Ecological Study: Cluster Study Strenghts

A

o Potential for high exposures, particularly in occupational clusters
o Can be fast and cheap to conduct
o Often identifies area for further analytic study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Ecological Study: Cluster Study Weaknesses

A

o Texas-sharpshooter effect
o Selection of comparison group can be challenging
o Random distributions can just be lumpy
o Hidden multiple comparisons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Difference between a cluster and an epidemic?

A

An epidemic is a cluster that epidemiologists take seriously

Cluster –> unvalidated –> common
Epidemic –> validated cluster –> rare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Normal Questions to ask during stage one of a cancer investigation

A
o	Rare vs common cancer? 
o	Age at death? 
o	Same types of cancer? 
o	Familial Relation?
o	Length of time together? 
o	Is it really cancer?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

3 local cluster investigatons

A
  • Breast Cancer on UES
  • Lung Cancer on Staten Island
  • Breast Cancer on Long Island
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Key events in practice of cluster investigation

A

o 1989: National Conference on Clustering of Health Events
o 1990: Conference proceedings published in AJE
o 1990: CDC published guidelines for investigating clusters (“how to cookbook”)
o 2007: CDC addendum on recent experiences
o 2010: Meeting of the Council of State and Territorial Epidemiologists and CDC to update 1990 guidelines
o 2013: Updated CDC Guidelines Published

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Traditional Cluster Definition

A

o Traditional Definition: perceived or real, greater than expected number of cases for a given set of time and/or space coordinates. Clusters occur in space, time, or time & space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Rothman Cluster Definition

A

o Rothman’s Definition (1990): All epidemiologists investigate clusters, but cases should be clustered around etiological factors. Distinguishes between etiologically relevant clusters (i.e. familial disease) and time/space clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

CDC Cluster Definition

A

o 2013 CDC Definition: a greater than expected number of cancer cases that occur within a group of people in a geographic area over a period of time.
• very cancer focused, as most clusters have to do with cancer, birth defects, or spontaneous abortion
• backs away from Rothman’s POV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

5 reasons that clusters are of interest

A

o 1) the hope that cases have clustered in time and or space bc of a common etiologic element that can be pinpointed….we want to identify the cause of the disease
o 2) Can define a new exposure route for an established cause (i.e. we know asbestos causes mesothelioma, so if we see much mesothelioma, we know to look for how people are being exposed to asbestos)
o 3) They generate publicity and fear in the environment → this must be addressed
o 4) Can drive policy debates (if you do find a cause, you need to decide what to do about it)
o 5) They’re a high volume issue… a 2000 paper estimated that there were 1100 reports of cancer clusters to PH officials in 1997.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Seven examples of successful cluster investigations

A

o Osteosarcoma in watch dial painters
o Three cases of angiosarcoma in one vinyl chloride plant
o Investigation of mesothelioma in small town in Turkey
o Investigation of mesothelioma in NJ town
o Cases of clear cell carcinoma of the vagina
o Outbreak of disease among attendees at an American legion meeting
o Cases of oral cancer among rural women in southern us

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Name the 3 types of clusters

A

o Reported or Perceived
o Validated Cluster
o Etiologic Cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

CDC Cluster Investigation Step 1

A

o Step 1: Initial contact and response
• Collect information from the people or groups reporting the cluster, provide education back to the person
• Most investigations stop at this stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

CDC Cluster Investigation Step 2

A

o Step 2: Evaluation
• 2a: Preliminary Evaluation
• Computer based evaluation; a quick, rough estimate of the likelihood that an important excess has occurred
• 2b: Case evaluation:
• Verify diagnoses; get out into the field and verify that the reported disease is actually really the disease (double counting, benign lumped in with sarcomas).
• 2c: Occurrence Evaluation
• Design and perform a thorough investigation to determine if an excess has occurred and to describe the epidemiologic characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

CDC Cluster Investigation Step 3

A

o Step 3: Major feasibility study
• Determine whether an epidemiologic study linking the disease & suggested exposure is even feasible…is it even possible?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

CDC Cluster Investigation Step 4

A

o Step 4: Etiologic Investigation

• Perform a full etiologic investigation of a potential disease exposure relationship

74
Q

Rothman’s Five Reasons why Cluster Investigation Will not be fruitful

A

o 1) Clusters typically involve too few cases
o 2) Cases are likely to be heterogeneous (report is a mixed bag of disease)
o 3) Clusters are often defined by Texas Sharpshooter approach (area defined after the fact)
o 4) Exposures are usually poorly characterized, heterogeneous and low in concentration (loose hypotheses makes it challenging to do work)
o 5) Clusters generate publicity, making it difficult to collect unbiased data by questionnaire

75
Q

Rothman’s 5 pitfalls of interpreting cluster reports

A

o Assuming that the rarer an observation is, the more likely it is to have an environmental etiology
o Overall reliance on statistical tools to define a cluster and lack of integration of biologic sciences into cluster interpretation → disease isn’t just temporal/special…what is the natural progression of disease/is it biologically possible?
o Misunderstanding of the limits of statistics – clusters occur continuously and spontaneously in large populations (random clusters DO just occur, hidden multiple comparisons occur (if you investigate enough you’ll find connections in something, even if all else looks normal)
o Use of inappropriate reference population (how do you have a good reference population for the control if you texas sharpshooter draw a line and include all cases in your area of concern?)
o Misunderstanding of public needs. Often they only need assistance in interpreting their observations.

76
Q

Neutra thinks Cluster Investigation is Useful if (9):

A

o At least 5 cases and relative risk is very high (>20)
o Disease is one for which a unique mechanism or class of agents has been responsible in the past, or if the mechanism is well understood
o The agent is persistent in the environment and can be measured there (counters rothman’s claim that its impossible to get data)
o Agent is persistent or leaves a response in the bodies of those who have been exposed
o There is enough heterogeneity of exposure in the community so that the effect of exposure can be assessed (want a gradation in exposure)
o Exposure is definable from records or questionnaires
o It would be feasible to carry out a multi-community study
o Geographic Cluster are easier to work with than time clusters
o If problems identified by Rothman have been screened out of the evaluation process.

77
Q

Rothman ideas on where/how to do a cluster investigation

A

o Rothman would not include the cluster in the investigative studies. Advocates using other exposed communities or occupational studies.
• Allows you to do better quality studies: more controlled, less biased recall

78
Q

Neutra ideas on where/how to do a cluster investigation

A

o Neutra would include the cluster in investigative studies. Advocates multi-community studies.
• Do the cluster in the study
• Helps the department maintain the trust of the community it serves
• What if it’s a specific circumstance in that community that is causing the issue… if you go to another community you might miss the issue → do multi-community studies but include the issue in the community of interest so you don’t accidentally miss something.

79
Q

Why do Goodwin et al bring up Kinlen’s population mixing hypothesis?

A

o Leukemia clusters are often observed in isolated geographic areas that experience a rapid influx of population
o Noted within the context of clusters of leukemia around industry sites
o Suggests an infectious etiology for leukemia
o Goodwin et al bring it up because it gives an explanation other than chemically caused for leukemia…and his research was funded by chemical companies!!

80
Q

Molecular Epidemiology Definition

A

the use of biomarkers in epidemiologic studies.
• Has been extensively applied to the study of cancer and the environmental etiology of cancer.
• Was created as a way address limitations in traditional epidemiology; a establishment of a paradigm of how biomarkers can help epidemiology
• Biomarkers are used as a construct (specific mechanism to tell a story).
• A way to break open the “black box” of causation and link exposure to a mechanism to disease… its used to understand HOW an exposure causes disease.

81
Q

Biomarker Definition

A

the measurement of a physical parameter in biological tissue or fluid obtained from a study subject. Could include measuring a level of contamination, a genotype, levels of damaged DNA, protein levels, or the amount of a cellular fraction.

82
Q

Biomarker Continuum

A

o Exposure → Internal Dose → Molecular Dose → Biological Effect → Disease

effectively a menu system to find each marker in a chain of diseae development to tell the story of how exposure led to disease

83
Q

Internal Dose

A

How much exposure got into the body (think cotinine, DDE as markers for cigarettes, DDT)

84
Q

Molecular Dose

A

the Bioeffective dose, aka the amount that got into the body, got to a critical target, and damaged that target (think DNA protein adducts)

85
Q

Biological Effect

A

a measurement of a change that occurred in the body because of an exposure (think mutated oncogenes/tumor suppressor genes)

86
Q

Disease Biomarkers

A

Can use biomarkers to stratify disease into subtype (think estrogen receptor plus or minus breast cancer)

87
Q

Susceptibility Biomarkers

A

biomarkers exist that are susceptibility factors (think genetic polymorphisms, nutrition, DNA repair)

88
Q

Carcinogen DNA Adducts

A

many carcinogens are thought to cause cancer by forming a chemical bond to DNA that damages the DNA and causes mutations in the genetic code (think Benzoapyrene covalently bonding to adenine, forming a PAH-DNA adduct that you can measure in the blood as a measure of the biologically effective dose).

89
Q

Genetic Polymorphisms

A
  • Common (>1% prevalence) inherited variations in the coding of genes.
  • Can alter the function of the gene, or have no impact
  • Most interesting are polymorphisms in genes responsible for metabolizing and detoxifying chemicals.
90
Q

Chemical Metabolism Process;

A

o Chemical Metabolism:
• Most chemicals are lipid soluble; most convert them to water soluble forms to be excreted
• Phase 1: Opens up ring structures of lipid soluble chemicals and adds oxygen to molecules. Ofen causes formation of toxic metabolites (i.e. P450/cyp GENES) → causes the formation of reactive intermediates which are the things that can cause damage
• Phase 2: The addition of large water-soluble molecules to the metabolites formed in phase 1; once water molecule is there its not harmful.
o So, if Phase 1 & Phase 2 work perfectly, you’re good. But if you have a polymorphism so that they don’t, you can get cancer.

91
Q

Somatic Mutations

A
  • Acquired alterations in the coding of genes that often disrupt the function of the genes
  • Commonly caused by improper DNA replication/repair, exposure radiation, exposure to other chemicals → nowadays the target of precision medicine
  • Specifically interested in mutated oncogenes and tumor suppressor genes
92
Q

Oncogenes

A

normal genes that when mutated can cause or contribute to carcinogenesis (cells divide rapidly and aggressively bc growth signal is removed, think ras p21 and erbB2)

93
Q

Tumor Suppressor Genes

A

genes that suppress the formation of tumors through functions such as coordinating DNA repair, halting the cell cycle, or inducing cell suicide (apoptosis). When mutated the genes no longer perform their function and this can contribute to malignancy (think p53 and retinoblastoma gene, rb) → effective the “brakes” for the body; if mutated your body can’t put a “break” on cancer cell replication.

94
Q

Urinary Metabolites - Measurement Challenges

A

• Challenges: hydration status affect urine volume and how dilute the urine is → typically adjusted by an indicator of hydration status or urinary dilution (Creatinine levels or specific gravity are most common indicators to adjust) → however now its clear that these adjustors are themselves influenced by age, gender, race/ethnicity/body size!

95
Q

Biomarker Pros

A

o Allows for measurement of poorly remembered exposures, of exposures an individual is not aware of, or exposures a subject might misreport, potentially reducing measurement error
o Can provide an integrated measure of exposure from multiple sources (sum total of exposure; an integrated dosimeter)
o Can provide estimates of the amount of a chemical actually hit a given critical target (a bio effective dose)
o Can allow for measurement of constructs not measurable by other means
o Can prevent recall bias
o Can be used to test mechanistic hypotheses
o Can aid in defining disease status, or refining disease definition
o Can be used to detect pre-clinical effects of disease

96
Q

Biomarker Cons

A

o Requires biological samples
o Requires laboratory analyses
o Intra and Inter lab variability, lab drift
o Expensive, requires infrastructure and personnel
o Added selection biases:
• Biomarker Refusers
• Co-morbid conditions hamper sample collection (i.e. may not be able to sample the sickest person because of comorbid conditions)
• Loss/inadequate sample amount
• Interesting samples are overused
• The “case-series and some other people” study (i.e. can get a bad design…lots of cases, and very few controls, or a strange convenience sample of controls)
• Intra-individual sampling (within same person may see a difference due to time of day, etc.)
o Increased ethical concerns, particularly with regards to genetic tests
o Complex toxico-kinetics, half-life issues.

97
Q

Biomarkers as measures of constructs - critique

A

o Tendency is to see them as real things, with real meaning
o However you should see it as a scale → it’s a construct through which we interpolate causation → have to remember to ask, is your construct valid?
o We have a theoretical exposure construct that we try to approximate through the measured exposure. We want to know the theoretical disease construct, but what we actually can analyze is the measured disease. Therefore, what we are measuring may not in reality be telling us what we want to know…

98
Q

Lab assay issues

A

o Even for routine assays there is within lab and between lab variability
o Samples are run in batches which are constrained by the number of hours in the day, the technology, enrollment rates, the need for administrative reports. Samples must be carefully assigned to batches to balance out batch effects.
o Validity studies are common in molecular epi but reliability studies are rarer
o Its questionable whether biomarkers are actually more reliable than questionnaires.
o = Lab results cannot be considered an objective truth that makes them better than other forms of exposure assessment.

99
Q

One way to handle inefficiency in using biomarkers in large cohorts that is otherwise known as incidence density sampling

A

A nested-case control study (aka Incidence density sampling)
• Do a case-control study confined to the people in the cohort, that matches controls with cases as they appear on time of follow up.
• If a control is later diseased, it can become both case and control (sampling on person time, not person-ness).
• Can also re-use controls since it depends on time.

100
Q

Matching in Nested Case-Cohort Studies

A
  • All nested case-control studies MUST match controls to cases on length of follow-up in cohort.
  • In addition it is typical to match on age and gender (but this is not required).
101
Q

What are you doing in a case-cohort study?

A

• This is where you randomly select 10% of the cohort as your subcohort. You only analyze data from people who were randomly selected as your “subcohort”. In this situation, you take all cases, and use the sub-cohort as your “forever” control.

102
Q

Bottom line of Nested Case Control Studies

A
  • Nested Case-Control Studies
  • Match on length of follow up
  • Can match on other factors as well
  • Allows for calculation of the IRR
  • Allows for Efficient control for confounders (because time is fixed!)
103
Q

Bottom line of Case-Cohort Studies

A
  • Involve no matching
  • Use a random-sample of the cohort (a “sub-cohort”) at baseline as the referent group
  • The Sub-Cohort can be used as a referent group for future case series
  • Prevalence of exposure can be estimated and external comparisons can be made (can extrapolate to the whole cohort)
  • Time is NOT fixed (i.e. if 10% of the cohort becomes diseased, they can show up 2x, you don’t kick them out).
104
Q

Concerns due to Matching in Nested Case-Control Design

A
  • The control series is not intuitively understood and is difficult to work with
  • Controls are not representative of the cohort population
  • The control series has few other uses, so the investment in biomarker analyses cannot be leveraged for other research
  • Overmatching can happen (when matching imposes a sample size penalty…may not be able to find any controls in a reasonable sample size).
105
Q

• Case-Cohort relies on assumption that exposure can be equally well measured in the sub-cohort as in the cases, and subsequent case-series, but 3 issues make this assumption questionable:

A
  • Batch Effects
  • Storage Effects
  • Freeze-Thaw Cycles
106
Q

Batch Effects

A

o Samples are run in batch or groups due to logistics

o Average results can unfortunately vary by batch

107
Q

Storage Effects

A

o Biological samples are typically stored at -70 degrees C or lower, however not all biomarker targets are stable for 10-15 years of storage at this temperature
• Evidence that antioxidant micronutrients in serum, cotinine, and B[a]P-DNA adducts are stable
• Evidence that serum cholesterol, free PSA, serum sex hormones, salivary antibodies, and immuniary histochemicals targets in tissue sections are not stable

108
Q

Freeze-Thaw Cycles

A

o As biological samples freeze and thaw the PH and ionic balance of the liquid phase of the sample can be very different from the natural condition of the sample. Changes in PH and ionic balance can degrade biomarker targets.
o There is evidence that lipoprotein a, antibodies, endogenous antioxidants, saliva cortisol, EGFR and DNA quality degrade during freeze-thaw cycles.

109
Q

Incentives to run subcohort samples right away in prospective case cohort studies (due to biological markers)

A

o Spreads the work out over the study period, reduces pressure on the lab
o Allows for cross-sectional analyses of determinants of biomarkers to begin
o Can perform cross-sectional analyses of prevalent cases and sub-cohort
o What else are you going to do with your time?

110
Q

Problems with analyzing the subcohort right away in prospecitve case cohort studies? (due to biological markers)

A

o Most cases will accrue toward the end of the work period, and will not be subjects included in the sub-cohort → case and control samples being analyzed in different batches, and cases having longer storage durations on samples.

111
Q

Issues for biomarkers in retrospective case-cohort studies?

A
  • Pro: Since sub-cohort and cases are identified retrospectively biological can be randomized to batches, removing bias
  • Con: Depending on how long it took to assemble the cohort, storage duration could vary by several years among subjects
  • Con: Certain samples may have undergone several freeze-thaw samples
112
Q

What is good about the fact that time is not fixed in case-cohort analyses?

A
  • Time is not fixed by design. Can use duration of follow-up, age, time since event of interest, or any other scale → can imagine time however you want.
  • Changing the time scale will alter which subjects are included in risk sets and the calculation of person years
113
Q

Biomarker Logistics in Nested Case-Control Studies

A
  • Matching on length of follow-up typically means that the cases and controls are matched on sample storage duration
  • Because cases and controls are identified simultaneously, samples can easily be matched on batch
  • It is also possible to match on number of freeze-thaw cycles
  • BUT → complexities arise when a subject is both case and control, appearing multiple times on a data set, and any matching schematic must be respected in all analyses.
114
Q

Most useful applications of molecular epi are:

A
  • Exposure assessment when exposure is ambient or when exposure comes from multiple sources that are hard to characterize
  • The ability to assess the joint effects of genetic susceptibility and exposure
  • To conduct mechanistic studies
115
Q

Significant challenges of molecular epi that are not recognized are:

A

Perhaps the most troubling of which are the additional potential for selection bias and the widespread view that lab results represent some objective truth, when really they’re a construct (with measurement error and questionable reliability).

116
Q

For biomarkers not affected by batch, storage, and freeze-thaw cycles, what design should you use?

A

use the case-cohort design
• Can estimate both risk ratio and rate ratio
• Sub-cohort can be used as a referent for multiple-case series
• Simple random sample allows for valid cross-sectional analyses and external comparisons
• It best leverages the investment in biomarker analyses (best roi for samples)

117
Q

For biomarkers affected by batch, storage, or freeze-thaw cycles, what design should you use?

A

the nested case-control appears to offer the best approach to control biases that could be introduced by lab assays:
• Matching allows for efficient control of confounders
• But The controls have few other uses (cant represent the whole cohort)
• But Must respect the matching though, and much more thought is needed (i.e. may need to build a social graph of all study subjects).

118
Q

Above all, what should you consider in developing a biomarker study?

A

the durability of your biomarker.

119
Q

Why is it important to understand the the genes that might contribute to an increased susceptibility to enviroinmental agents?

A

o Understanding the genetic susceptibility to environmental agents will allow us to better identify environmental agents that cause disease and to more accurately estimate the true risks of exposure

120
Q

Specifics of Polymorphisms

A
  • High Prevalence (>1%)
  • Low Expected Individual Risk (it cant be so bad or tons of people would be dying from it)
  • Appreciable expected population attributable risk (because its so prevalent in the population)
  • Cohort, Case-Control, and Case-Case studies are most appropriate ways to assess Gene-Environment Interactions here
  • Don’t need to worry about individual genetic counseling, but do require some consent form requirements
121
Q

Specifics of Mutations

A

• Low prevalence (

122
Q

What polymorphisms are most interesting for enviornmental epi?

A
  • Genes that code for proteins involved in the metabolism, activation, and detoxification of xenobiotics
  • Genes that code for DNA repair proteins, that undo damage caused by xenobiobtics
123
Q

What is known about both types of polymorphisms which are most interesting for enviornmental epi?

A

o Both types of genes have a clear biological rationale & a mechanism of activity that is known
o They are known to have common polymorphisms (i.e. roughly 40-60% of Caucasians lack GSTM1)

124
Q

What is challening about both types of polymorphisms which are most interesting to enviornmental epi?

A

o But we don’t really know the functional significance of many of these polymorphisms (i.e. some don’t seem to matter)
o Not sure how to analyze heterozygotes

125
Q

Model A of Gene/Environment Interaction

A

o Model A: The Genotype increases Expression:

• I.E. PKU → High Blood Phenylalanine → Mental Retardation

126
Q

Model B of Gene/Environment Interaction

A

o Model B: The Genotype Increases the effect of the Risk Factor (RF always puts you at risk, gene makes it worse)

• I.E. XP increases the effect of UV radiation, exacerbating risk of skin cancer

127
Q

Model C of Gene/Environment Interaction

A

o Model C: The RF Increases the effect of the Genotype (i.e. the genotype always puts you at risk, but the RF accentuates it)

• I.E. exposure to barbiturates increases the effect of porphyria variegta gene which puts you at risk for skin problems

128
Q

Model D of Gene/Enviornment Interaction

A

o Model D: Both genotype and RF are required to put you at risk for disease

• I.E. G6PD deficiency + Fava Bean Consumption → Hemolytic Anemia (but wouldn’t have Hemolytic Anemia if you didn’t have both)

129
Q

Model E of Gene/Environment Interaction

A

o Model E: Genotype and RF each Influence each other (Multiplicative approach)

• I.E. Smoking & Antitrypsin Deficiency both influence likelihood of getting emphysema

130
Q

What model of gene/environment interaction is not an effect modifyer?

A

Model A (Genotype increases Expression)

131
Q

Issues with studying gene/environment interactions

A

o Methodology of reporting of an Interaction has challenges
o The literature on most gene-environment interactions is not consistent → findings flip
o Technology to perform high through-put genetic analyses has far outpaced the statistical analytical approaches to analyze the data → too many possibilities for multiple comparisons
o Pathways of interest often involve multiple genes that can each have polymorphisms. Its not clear how to factor in the effects of multiple polymorphisms in a pathway
o Some think we should instead just think about phenotypes
o Some think we should just give up, and consider it all descriptive epi that can be sorted out later!
o The search for gene-environment interactions requires very large sample sizes to detect a statistically significant interaction term
• Assuming imperfect measurement, the sample size skyrockets and artifactual interactions may appear → biasing results and causing the appearance of fake interactions/hiding appearance of real interactions.

132
Q

When can you use a case-only study to approximate a full cohort gene-envirionment interaction study?

A

• Bottom line: if gene/exposure ARE independent (if there is really no association between Genes/Exposure in the underlying population), then a case-only design allows me to calculate the interaction effect of the full cohort because the Case-Only OR is equivalent to the Cohort GxE risk ratio.

133
Q

Advantages of Case-Case Design for Gene-Environment Interactions

A
  • No need to enroll controls
  • Can focus resources on cases, which may allow you to enroll a larger case-series than otherwise possible
  • Offers better precision for estimating interaction OR (smaller standard errors due to the elimination of control group variability)
134
Q

Disadvantages of Case-Case Design for Gene-Environment Interactions

A
  • There is no way to know whether the assumption of gene-environment independence in a source population is correct
  • Therefore the Interaction OR becomes a big black box
135
Q

How do you handle it if you know that gene-environment independence is not guaranteed?

A
  • If the sources of non-independence can be conceptualized and measured, non-independence can be controlled for in the case-only design
  • I.e. can control for family history if you expect post-menoposal hormone use and brca 1 mutations are assocated
  • I.e. can control for an adverse reaction to alcohol if you expect alcohol intake and alcohol dehydrogenase polymorphism to be associated in the population?
  • Univeriate case-only analysis is potentially plagued by g/e non-independence.
  • Multivariable case-only analysis creates conditional independence by controlling for the thing that causes the association
  • Therefore, do not adjust for confounders of main effects but rather for variables that explain non-independence.
136
Q

Conclusions on Gene-Environment Class

A

o From a theoretical and societal perspective the search for gene-environment interactions is very attractive, but the search has not been very fruitful
o Phenotypes or functional parameters are really the causal factor of interest and how genotypes determine phenotype can be very complex (because how something occurs can depend on a lot of genes)
o There is growing recognition that we should put more effort into measuring functional parameters.
o Case-only studies of Gene-Environment Interactions have several advantages (no controls, more power) but rest upon hard to verify assumptions.

137
Q

3 neighborhood exposures of interest

A

Built Environment
Socio-Economic Environment
Social Environment

138
Q

Built Environment

A

– the physical structures built by man (parks, streets, residential, commercial, industrial land uses, business)

139
Q

Socio-Economic Environment

A

Poverty, Income, Home Ownership, Racial/Ethnic/Immigrant Composition
• To some researchers these are just covariates, to others these are an environment

140
Q

Social Environment

A

– social cohesion, social control, crime/violence

• Community boards, groups, others who are willing to step into help others’ children

141
Q

Eco-Epidemiology

A

a view of epi that encompasses many levels of organization – molecular and social as well as individual – and aims to integrate more than a single level in design, analysis, and interpretation

Russian Nesting dolls → do studies that involve all levels of exposure.

142
Q

5 types of neighborhood exposure measurement

A
•	Neighborhood Exposure Measurement
o	Study Respondent Self Report
o	Systematic Social Observation/Neighborhood Audits
o	Remote Sensing
o	Geographic Information Systems
o	All of the above
143
Q

3 Study Repsondent Self Report STrenghts for neighborhood exposures

A

o Once recruitment and informed consent are given, data gathering is low cost and straight-forward
o Researchers can define data of interst to be collected, and ask specific questions about specific neighborhood features
o Useful for constructs that reflect a resondent’s perceptions and understanding of social interactions and processes – ie social cohesion measures (that are hard to get to the bottom of unless you ask directly)

144
Q

3 weaknesses for study respondent self report for neighbhorhood exposures

A

o As with all the questionnaire data, respondent self-reports are subject to recall issues
o Perceptions of neighborhood conditions may be conflated with actual conditions – particularly around issues of access and crime (i.e. actual access to parks and perceived access to parks often don’t correlate → however, it may be that perceptions are more important than reality!)
o When collecting data on health outcomes and neighborhood conditions from a respondent the data can be subject to same source bias
• i.e. people who are unhealthy may have a negative view of everything & vice versa
• Could use a second independent sample of participants to provide environmental data (i.e. one group for neighborhood, one group for health outcomes, to eliminate same source bias).

145
Q

what are neighborhood audits, how can they be done?

A
  • Teams of trained researchers collect in-person, observational data in a neighborhood
  • Several appraoches have been used:
  • Field teams doing in-home data collection also collect data on the block the study subject lives on
  • Field teams collect data from a structured sampling plan of blocks within neighborhoods and then spatial analyses are conduced to estimate conditions at non-sampled blocks (i.e. environmental sampling team can then use kriging and land-use regression to estimate other blocks)
146
Q

Strengths of Neighborhood Audits

A
  • Investigator can define exposure constructs of interest
  • Can collect data on constructs not commonly indicated in administrative or commercial data (eg aesthetics)
  • Investigators can go to wehre the exposures are and get “on the street” experience of exposure
  • Obesrvations can be objective, or at least independent from disease ascertainment
147
Q

WEaknesses of Neighborhood Audits

A
  • Need validated and reliable audit tools (i.e. RWJF Active Living Research Web Site has a Repository)
  • Reliability Issues with a Multi-Person Audit Team (need high reliability between auditors)
  • Safety issues for auditors
  • Community acceptability and respect for the community (how do they feel about you being there?)
  • Cost and Time (can be expensive)
148
Q

Alternative Implementations of Neighborhood Audits

A

• Vans with Vido Cameras that film street segments
• Bikes with Go-Pro Cameras on the handlebars; Go-Pro Cameras on Cyclist Helmets paired with environmental samplers
• Traffic Camera Photo archives (are actually available online; can set up webscraping service to build a motherload of still cameras everywhere)
• Google Street View Virtual Neighborhood Audits → Computer Assisted Neighborhood Visual Audit System (CANVAS)
o Three interfaces
• Study admine module
• Rater interface
• Real time analytics
o Can use multiple raters on the same blocks to facilitate reliability
o Saves time = more data; higher quality control with multiple users
• Coding of Data is TimeConsuming – use of computerized image-analysis, Mechanical Turks, Crowd Sourcing, Game-ification(CODEATHONS)

149
Q

types of remote sensing

A
  • Satellite Imagery (national land cover database is the standard for measuring green space)
  • Plane fly-over with cameras or LIDAR (lazer beams capture height of all items on earth)
  • Anonymous cell-phone tower pings (to see how people move through cities)
  • Internet traffic analysis and patterns of web site viewing across communities (to learn about people living in those communities).
150
Q

Remote Sensing Strenghts

A
  • High resolution data with wide coverage

* Objective measures

151
Q

Remote Sensing Weaknesses

A
  • Use of off-the-shelf data is opportunistic
  • Custom data collection is expensive
  • Data processing requires experts and is expensive
  • Security issues (you cant download this data w/ out security clearance/DUAs)
152
Q

How does GIS technology work?

A

• GIS technology integrates common database operations such as query and statistical analyses with the unique visualization and geographic analysis benefits offered by maps → effectively it’s a database with mapping capacity, where place is the theme of the entire data structure.

153
Q

GIS definition

A

is a computer based-tool for storing, manipulating, mapping, and analyzing geographic phenomenon that exist, and events that occur, on earth

154
Q

urban informatics definition

A

the tapping into, organization, and analysis of the massive data effluent produced by data centers

155
Q

GIS is a study of…

A

layers; you aggregate data from layers to determine measurements

156
Q

Areal interpolation

A

estimating the attributes of one layer (the neighborhood buggers) from the estimates of another (the block group)

157
Q

GIS Strengths

A
  • Powerful tool to integrate data from many sources – you just need a location
  • Code for spatial analyses can be archived for reuse or auditing
  • Massive amounts of spatially aligned data are becoming available
  • Mapping provides insights and interpretation (Journal of Maps) and may be useful for community engagement
  • Room for a lot of creativity
158
Q

GIS Weaknesses

A
  • Relies on data availability
  • Data is often proprietary – data licenses and use agreements
  • Data is often expensive to buy or license
  • Data from different jurisdictions is often incompatible (i.e. NYC has completely different data from Westchester)
  • Issues of provenance and data-anarchy (can get a differences of opinion, and trading of data can make it challenging to make sure the data is accurate)
  • Requires expertise, specialized software, and higher end hardware.
159
Q

Ways to think about neighborhood divisions

A
  • Geographic vs conceptual space
  • Continuous vs fragmented space (i.e. my “neighborhood” is Columbus Circle and Wash Heights, even though there is a huge jump between them)
  • Bounded by Administrative lines or sociodemographic lines (i.e. are streets or other factors the dividing line between neighborhoods)?
160
Q

How are neighborhoods generally operationalized in research?

A

into zip codes and census tracts

161
Q

How can neighborhood operationalization be an issue?

A

o Can create a Multiple Areal Unit Problem (MAUP)
• If you do analyses with one definition of a neighborhood, and then again with another, you’d get multiple different answers. Can lead to researchers “hunting” for statistical significance. So – you need to ask – why did they define the neighborhood as they did?

162
Q

Other Ways that neighborhoods can be operationalized in research

A

o Can do a ½ mile radial buffer around a point, and subtract out water
o Can create a ½ mile network buffer – puts a participant at the center of her own neighborhood – helps with boundary problems – calculates all the space a person could reach by walking a certain time or distance.
o Or can use GPS data to understand how people use their neighborhoods; use the smallest convex polygon that fits all their movement in analyses

163
Q

Neighborhood walkability definition

A

a set of urban design features that support active travel. Higher neighborhood walkability is hypothesized to be associated with higher total physical activity and lower BMI. Neighborhood population density is one measure of walkability.

164
Q

spatial epi

A

o Explicitly considers the effects of place, and analyzes the determinants of spatial variation in disease → very important to air pollution
o There have been a number of cross-sectional and ecological studies and a few cohort studies of air pollution and mortality.

165
Q

Cross-Sectional Ecological Studies

A

o Average Exposure Data are collected for a series of localities and mortality data is collected for the same localities
o A regression model is implemented to determine whether increased pollution levels are associated with increased mortality risk.
o Can stratify by socio-demographic characteristics.

166
Q

Ecological Fallacy

A

o Occurs when one assumes that associations seen at the group level implies the same or similar effects for individuals
• → its possible that just because a city has a high mortality rate due to pollution that it doesn’t mean YOU will have a high mortality due to pollution
o You cannot project group level findings down to the individual level with any certainty of success.
o The reasons as to why you cannot make this leap are above and beyond confounding, but in particular, its bc you don’t know if the individual who died was actually exposed to the measured pollution.

167
Q

Ironic part about air pollution and ecologic fallacy

A

o In the 90s results from cohort studies showed exposure to particulate matter was associated with increased mortality risk
o Despite concerns regarding ecological fallacy with earlier cross-sectional studies, cohort results were consistent with earlier studies.
o → EPA setting national standards for PM2.5!

168
Q

What are the four issues with spaital epi that arise from ignoring place?

A

o 1) Clustered Data Structures
o 2) Spatial Autocorrelations
o 3) Group level or ecological predictors
o 4) Size of the chosen spatial unit

169
Q

What are the issues with clustered data structures that arise from ignoring place?

A

• Standard Cox Analyses may underestimate uncertainty and produce inaccurate estimates of statistical significance → confidence interval (standard of error) gets wider, even if point estimate doesn’t change. because cox analyses assumes independence of observations, and study particpants prob arent independent

170
Q

What are the issues with spatial autocorrelation that arise from ignoring place?

A
  • Essentially, it’s the effect of regionalism (i.e. the south is very different than the northwest; there are regional patterns of health, health behavior, exposure, etc.)
  • Survival times of subjects in areas closer together may be more similar than for subjects living in areas far apart
  • If the special autocorrelation is due to an unmeasured factor that is ALSO spatially auto correlated with air pollution then the cox analyses may be biased, and shift your point estimate up or down.
171
Q

What are 3 ways to counter both spatial autocorrelation and clustered data structures that arise from ignoring place?

A

A) Independent Cities Model
B) Regional Adjustment Model
C) Spatial Filtering Model

172
Q

PRos to independent cities model

A

o Acknowledges that people are clustered in cities
o Uses a random effects approach to describe between city variation
o Provides better estimates for p-values or 95% CI
o Can be expanded to acknowledge that people are clustered in cities which are nested in states

173
Q

Cons to independent citys model

A

o Ignores regional effects
o Only factors in the effect of the designated geographical area (i.e. recognizes Boston is separate from NY, but doesn’t realize that NY is closer to Boston than LA…each city is considered separately).

174
Q

Pros to Regional Adjustment Model

A

o Acknowledges that people are clustered in cities
o Adds indicator variables to represent seven regions in the US
o Adjusts for regional patterns in mortality

175
Q

Cons to Regional Adjustment Model

A

o But still can’t tell that NY is closer to boston than it is to Portland ME, bc all are in North East

176
Q

Spacial Filtering Model Pros

A

o Acknowledges that people are clustered in cities and that there are regional effects on mortality
o The effects of regional patterns in mortality and ecological predictors of mortality are controlled for.
o Compares the relative risk for a city with the risks for all other cities within a specified distance → nearness is taken into account!

177
Q

Spacial Filtering Model Cons

A

o Choosing the distance bandwith for a “nearness” predictor; could have to fit a lot of bandwidths in models which can get multiple comparison-y (because we really don’t know what distance of nearness matters)
o Its complicated to run with multiple assumptions.

178
Q

Issues with Group Level or Ecological Predictors that ignore space

A
  • The characteristics of the groups or places that we live within predict our health
  • Zip code level measures of wealth (particularly measures of deprivation) have been shown to predict mortality
  • We have found that zip code and census tract levels of income, racial mix, and built environment predict BMI even after controlling for individual levels of age, race, gender, income, and educational attainment
  • Social and demographic characteristics are clustered and have to be assessed in multi-level models (your income predicts your health, but so do the incomes of everyone around you) → there is an interaction, and it has to be accounted for.
179
Q

Issue with Size of Spatial UNit Chosen (showing why you cant ignore place)

A
  • Metropolitan areas can be very large areas with substantial within area variation in pollution and group level socio-demographics
  • There can also be within area clustering
  • There can also be within area spatial autocorrelations
  • Therefore, even within the city, you’re facing aspects of the other 3 issues.
  • Chosen spatial unit can matter; zip code isn’t always the best choice, as they were really designed for mail…they can cut across counties/states, and may be as small as a single building!
  • Observed effects of an exposure can vary depending on the size of the unit chosen.
  • Krieger showed that effects of socio-demographic factors on mortality vary when different size geographic units are used → scale of risk seems to change as the size of the area you analyze changes.
  • Was impossible to re-analyze the ACS data at the county level via the Spatial Filtering model, because there was a clustered arragmenent in the counties
180
Q

Conclusions from spatial epi lecture

A

o Often characteristics of a place are treated as if they are characteristics of the person.
o However, we must consider that the clustering of people into groups and the larger scale correlations within regions can cause biases in our estimates of risk and precision (both the clustering and how people get clustered)
o In addition, characteristics of neighborhoods or larger geographic units (eg percent poverty) can have important associations with risk.
o The choice of the size of the geographic unit can alter the results of the analyses.