Mid term 2 Flashcards
Reference Dose for Lec 9
NOEL or BMD10 / UFa * UFh * UFs * UFL * MF * DF
UFa = animal to human extrapolation UFh = average human to sensitive human URs = sub chronic to chronic exposure UFL = LOAEL to NOAEL MF = modifying factor DF = data quantity/quality factor
Stochastic effects
Effects that occur by chance, generally occurring without a threshold level of dose, whose probability is proportional to the dose and whose severity is independent of the dose. In the context of radiation protection, the main stochastic effects are cancer and genetic effects.
Dose response for all individuals higher doses cause a higher random chance of being hit
Linear Non-Threshold (LNT) Approach to Assess
(Genotoxic) Cancer Risk
Approach from the EPA for 45 years. For carcinogens known to have a genotoxic mode of action. Linear approach. Considered conservative approach. Involves extrapolation at low doses
Cancer risk graph
Risk = Exposure (LADD) * CSF
LADD – Lifetime Average Daily Dose
ED10 – Effective dose to achieve
10% cancer incidence
LED10 – 95% lower confidence limit
for ED10
Cancer slope factor(CSF) mg/kg/day
Equation Estimating Exposure and Cancer risk
estimating exposure:
LADD (mg/kg/day) = concentration x intake rate x exposure duration / body weight x lifespan
Estimating risk:
Risk = slope factor (per mg /kg per day) x LADD
LADD lifetime average daily dose
Polycyclic aromatic hydrocarbons (PAHs):
Sources and Uses
Ubiquitous contaminants occurring naturally (crude oil) or created from incomplete combustion and released from both natural (forest fires) or anthropogenic (burning of
fossil fuels)
Natural
Forest fires
Oil seeps
Volcanos
Anthropogenic Wood burning Internal combustion engine (vehicle exhaust) Cigarette smoke Roofing/coal tar products Electric power generation Petroleum
Polycyclic aromatic hydrocarbons (PAHs):
Chemical characteristics
Two or more aromatic rings with a pair of carbon atoms shared, highly lipophilic
16 priority EPA PAHs (ATSDR, 2005)
Toxicity
Potential for human exposure
Frequency of occurrence at hazardous waste sites
Available information
Include probable and known human carcinogens
Broader class of polycyclic aromatic compounds
over 1500 chemicals total
diverse structural features
includes both substituted and unsubstituted forms
O N S CH3
little data on source exposure and toxicity mechanisms
PAH mixtures also complicate things.
Regulation before relative potency factor
Before 1993 all PHA risk were equipotent to benzo[a]pyrene (BaP). The other 6 PHAs evaluated were not as potent and overestimated in cancer risk. They could not calculate slope factors because of insufficient data so they were treated like BaP
1993 relative potency factor for quantitative assessment of PHAs.
Based on tumor studies comparing >1 PAH
Should be able to estimate carcinogenic potency
for various PAHs by comparison to a standard
Recommend BaP as a standard
Estimates of individual slope factors could be
calculated as a percentage of the slope factor for BaP
Apply approach to Group B2 probable PAH
carcinogens
Evaluation of PAHS as complete
carcinogens in skin was most
comprehensive and recommended for use
Unsupervised modeling
The program is given a bunch of data (no labels) and must find patterns and relationships therein.
• Clustering
• Principle components analysis
Supervised modeling
The program is “trained” on a pre‐defined set of “training examples” (with labels), which then facilitate its ability to reach an accurate conclusion when given new data.
• Classification
• Regression analysis
Unsupervised learning methods
The model is not provided with the correct results during the training.
• Can be used to cluster the input data in classes on the basis of their statistical properties only.
• The labeling can be carried out even if the labels are only available for a small number of objects representative of the desired
classes.
Supervised learning methods
- Training data includes both the input and the desired results.
- For some examples the correct results (targets) are known and are given in input to the model during the learning process.
- The construction of a proper training, validation and test set is crucial.
- These methods are usually fast and accurate.
- Have to be able to generalize: give the correct results when new data are given in input without knowing a priori the target.
First step in supervised model
Data
Training set data: a set of examples used for learning where the target value is known.
Bad data yields bad models garbage in garbage out.
Second step in supervised model
Features
Feature: an individual measurable property of a
phenomenon being observed.
Feature selection
Third step in supervised model
Algorithm
Algorithm: the method or predictive modeling
technique used to identify patterns in the data
Support Vector Machine Neural Networks Nearest Neighbors Random Forest Decision Trees
fourth step in supervised model
The model
Model: the final prediction or output
Data + algorithm = model
3 Types of data sets required in Supervised model
Training set
validation set
test set
Training set
a set of examples used for learning, where the target value is known. Overfitting is a common problem due to the test set being small, the model is not that generalizable, and hard to apply to other data sets.
Validation set
a set of examples used to tune the architecture of a classifier and estimate the error
Test set
used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased estimate of the error.
Validation set
A set of examples used to tune the architecture of a classifier and estimate the error.
Purpose of cross validation
A model is developed using a training set.
Use training data, but remove subset of data for testing (or more, ex 2 fold)
The algorithm optimizes the fitting parameters for the training data. If we test an independent set of data from the same population as the
training data, it will generally not fit as well (lower predictive accuracy).
LOOCV
Leave‐One‐Out Cross Validation (LOOCV) – one observation/sample is removed from training data at a time and used for testing
Measures of model performance
- True positives
- True negatives
- False positives
- False negatives
- Sensitivity
- Specificity
- Accuracy
Sensitivity
True positive rate = # TP out of all positive
observations avoids false negatives
TP / TP + FN
Sensitivity: the ability of a test to correctly identify patients with a disease.
Specificity
True negative rate = # TN out of all negative
observations avoids false positives
TN / TN + FP
Specificity: the ability of a test to correctly identify people without the disease.