Final Exam Flashcards

(202 cards)

1
Q

The Four V’s

A

Volume
Variety
Velocity
Veracity - A lot of noise/false alarms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What makes predictive modeling difficult?

A
  1. Millions of patients to analyze - dx, rx, etc.
  2. Many models to be built
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Computational Phenotyping

A

Raw data (demo, dx, rx, labs) -> phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Patient Similarity

A

Simulate doctor’s case-based reasoning with algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hadoop

A

Distributed disk-based big data system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spark

A

Distributed in-memory big data system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: Hadoop is much faster than spark

A

False. Spark is in-memory so is faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps of the predictive modeling pipeline

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prediction Target should be both ____ and ____

A

interesting and possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cohort Construction Study

A

Defining the study population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prospective vs. Retrospective

A

Prospective - identify cohort then collect data
Retrospective - Retrieve historical data then identify cohort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: A prospective study has more noise in the data than a retrospective study

A

False. Retrospective study has more noise in historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: A prospective study is more expensive than a retrospective study

A

True. The data collection has to be pre-planned for the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: A prospective study takes more time than a retrospective study

A

True. The data collection has to be planned and executed before analysis of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: A prospective study more commonly involves a larger dataset than a retrospective study.

A

False. A retrospective study more often involves a large dataset because historical data can be accessed more easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cohort Study

A

The goal is to selected a group of patients who are exposed to a risk.

Example: Target is heart failure readmission. The Cohort contains all HF patients discharged from hospital. The key in a cohort study is to define the right inclusion/exclusion criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Case-Control Study

A

Identify two sets of patients - cases and controls. Put the case patients and control patients together to define the cohort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Case in Case-Control study

A

Patients with positive outcome (have disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Control in Case-Control study

A

Patient with negative outcome (healthy) but otherwise similar to the case patients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Feature Construction Goal

A

Construct all potentially relevant features about patients in order to predict the target outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example components of a Feature Construction pipeline

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A

Large observation window and short prediction window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
A

Small observation window and large prediction window. This is the most useful model but most likely unrealistic and difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A

Curve B because it can predict accurately for a longer period of time while the performance drops quickly for the other models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
C, 630 days. The performance plateaus beyond that point. There is a trade-off between how long the observation window is and how many patients have enough data for the longer window.
26
Goal of Feature Selection
Find the truly predictive features to be included in the model.
27
T/F: Training error is not very useful
True - Training data is prone to overfit
28
Leave one out cross validation
Take one example at a time as validation set, use remaining set as training. Repeat the process. Final performance is average predictive performance across all iterations
29
K Fold Cross Validation
Similar to leave one out cross validation except K items are left for validation, resulting in the dataset being split into K chunks
30
Randomized Cross Validation
Randomly split dataset into train and validation. Model is fit to training data and accuracy is assessed using validaiton. Results are validated over all the splits. Advantage over K-Fold - proportion of the training and validation split does not depend on number of folds. Disadvantage - some observations may never be selected into validation set.
31
What is hadoop mapreduce?
- Programming Model - Execution Environment - Hadoop is Java impl - Software package - tools developed to facilitate data science tasks
32
Hadoop provides what capabilities
- Distributed Storage - file system - Distributed Computation - mapReduce - Fault Tolerance - for sys failures
33
Computational Process of Hadoop
34
Fundamental pattern of writing algorithm using Hadoop is to specify algorithm as ____
aggregation statistics
35
First stage of MapReduce System
35
First stage of MapReduce System
36
Second stage of MapReduce System
37
Final Stage of MapReduce System
38
In what way is MapReduce designed to minimize re-computation
When a component fails only the specific component is re-computed
39
What is HDFS?
The back-end file system to store all the data to process using the MapReduce paradigm.
40
What are limitations of MapReduce?
1. Cannot directly access data (must use map/reduce and aggregation query) 2. Logistic Regression not easy to implement in map reduce - due to iterative batch gradient descent approach. Iteration requires load of data twice for each iteration.
41
MapReduce KNN
42
43
44
45
True Positive
Prediction Outcome Positive & Condition Positive
46
False Positive
Prediction Outcome Positive & Condition Negative
47
False Negative
Prediction Outcome Negative & Condition Positive
48
True Negative
Prediction Outcome Negative & Condition Negative
49
Type I Error
False Positive
50
Type II Error
False Negative
51
Accuracy
TP + TN / Population
52
True Positive Rate
TP / (TP + FN)
53
False Positive Rate
FP / (FP + TN)
54
False Negative Rate
FN / (TP + FN)
55
True Negative Rate
TN / (FP + TN)
56
Sensitivty
TP / (TP + FN)
57
Recall
TP / (TP + FN)
58
Specificity
TN / (FP + TN)
59
Prevalence
Condition Positive (TP + FN) / Total Population
60
Positive Predictive Value
TP / (TP + FP)
61
False Discovery Rate
FP / (TP + FP)
62
False Omission Rate
FN / (FN + TN)
63
Negative Predictive Value
TN / (FN + TN)
64
F1 Score
2 * [ (Precision * Recall) / (Precision + Recall) ] Harmonic mean of Precision and Recall
65
What does the ROC curve do?
Illustrates overall performance of a classifier when varying the threshold value
66
What is the AUC?
A performance metric that does not depend on threshold value
67
Regression Metrics (MSE & MAE)
68
Regression Metric that can be used across datasets
69
Gradient Descent Method
70
Gradient Descent Method for Linear Regression
71
Stochastic Gradient Descent
72
SGD for Linear Regression
73
Steps of Ensemble Methods
1. Generate a set of datasets (independently in bagging or sequentially in boosting) 2. Each dataset is used to train a separate model (can be independently trained models) 3. Aggregation function F (avg or weighted avg)
74
75
Bias Variance Tradeoff
76
Bagging
Take repeated samples of a dataset to create subsamples (with replacement), train separate models, then classify data point by taking majority vote of the models
77
Random Forest
- Create multiple simple trees for models and generate an average - Simple algorithms help with computational cost - Simple algorithms works better
78
Why does bagging work?
Reduces variance without increasing Bias
79
Boosting
Incrementally building models one at a time Based on mistakes and misclassifications create a subsequent model (better) repeat process over and over Final mode is weighted average (May be better than bagging but more likely to overfit)
80
81
Pros of Ensemble Methods
82
Cons of Ensemble Methods
83
Computational Phenotyping
Converting Raw Data (demographics, dx, meds, labs) into phenotyping then medical concepts (phenotypes ex. Diabetes I)
84
Applications of Phenotyping
1. Genomic Studies 2. Clinical Predictive Modeling 3. Pragmatic Clinical Trials 4. Healthcare Quality Measurements
85
Genomic Wide-Association Study (GWAS)
Approach that involves scanning biomarkings from single nucleotide polymorphisms (SNPS) from DNA of many people to find genetic association for specific phenotypes
86
How to run a Genomic Wide Association Study
1. Identify the phenotypes 2. Group patients into case and control 3. Get DNA samples from all patients For each SNP: 4. Compute frequency of SNP (single-nucleotide polymorphism) on cases and controls 5. Compute odds ratio 6. Compute p-value. If p-value is small, conclude the SNP is significant
87
Why do we care about phenotyping?
Rich and deep phenotypic data is needed to analyze genomic data. Cost of phenotypic data is increasing while cost of genomic data is decreasing
88
Clinical Predictive Modeling
Start with raw EHR data -> predictive model -> model
89
Why is predictive modeling from raw data not ideal?
- noise in raw data - complex/high-dimensional raw data - model is tied to raw data so cannot be adapted from one hospital to another
90
Pragmatic Clinical Trials differ from traditional trials how?
91
Main phenotyping methods
1. Supervised Learning a. expert defined rules (popular) b. classification 2. Unsupervised learning (clustering) - does not take as much time but missing ground truth. Needs large amount of training data. a. dimensionality reduction b. tensor factorization
92
Which phenotyping approach requires more human effort during evaluation? 1. expert-defined rules 2. classification models
classification models
93
Which phenotyping approach is easier to interpret? 1. expert-defined rules 2. classification rules
expert-defined rules because they use clinical intuition and knowledge.
94
Healthcare Applications of clustering
1. Patient stratification - group patients into clusters 2. Disease Hierarchy discovery - learning hierarchy between diseases and how they relate 3. phenotyping - data to concepts
95
Classical Clustering Algorithms include
K-Means, Hierarchical Clustering, Gaussian Mixture Modeling
96
Scalable Clustering Algorithms include
Mini batch K-Means, DBScan
97
K-Means Algorithm
98
N * k * d * i - n data points to k centers with d dimensions run i times
99
Hierarchical Clustering
100
Which Hierarchical Clustering approach is more efficient? Agglomerative or Divisive?
Agglomerative
101
Algorithm for Agglomerative Clustering
102
Example of a soft clustering method
Gaussian Mixture Model (GMM)
103
What is the equation for the Gaussian Mixture Model (GMM?)
104
GMM uses which popular optimization strategy?
Expectation Maximization (EM)
105
GMM Expectation Maximization Algorithm
106
GMM Initialization
Need to initialize mixing coefficient (pi_k), centers (mu_k), and variance (sigma_k) - good initialization can lead to faster convergence - bad initialization can fail to converge
107
How to improve GMM initialization
Use K-Means result to initialize for GMMs. Center (mu_k) can be the center from k-means result and the covariance (sigma_k) can be computed from all the data points in the clusters and the mixing coefficient pi_k can be the size of cluster k / n data points
108
GMM Expectation Step
109
GMM Maximization Step
110
K-Means vs. GMM (clustering, parameters, algorithm)
111
Mini Batch K-Means Benefit
With large dataset, it allows streaming data and using mini-batches rather than the full dataset
112
Mini Batch K-Means Algorithm
113
Mini batch K-Means: How to assign points in M (mini-batch) to current center in C
Save center in hash map to retrieve quickly
114
Mini batch K-Means: How to update C center based on assignments in M (mini batch)
center update becomes smaller and smaller (stabilizes) as the step size decreases
115
t * b * K * d
116
DBScan
Density-Based Spatial Clustering of Applications with Noise - clusters defined as areas of high density separated by areas of low density
117
T/F: The clusters found by DB scan are typically oval shape
False - they can be any shape
118
DBScan Key Concepts
- Density = # of points within epsilon to point p - Point is in a dense region if density is greater than a threshold - Core - Points in the dense region - Border - Points within epsilon distance from a core point - Noise - Points outside of epsilon distance to a core point
119
DBScan Algorithm
120
How many clusters can a datapoint belong to using the DBScan algorithm?
0 or 1
121
Clustering Evaluation Metrics
1. Rand Index - requires ground truth 2. Mutual Information - requires ground truth 3. Silhouette Coefficient - no ground truth required
122
Rand Index (RI)
123
Mutual Information
Measures the mutual dependency of two random variables (from information theory).
124
Pros/Cons of Rand Index and Mutual Information
125
Silhouette Coefficient
126
Pros/Cons of Silhouette Coefficient
127
Which better supports iterative ML Algorithms: MapReduce or Spark
Spark
128
___ is based on acyclic data flow from stable storage to stable storage
Hadoop
129
Limitations of Hadoop
Inefficient for applications that repeatedly reuse a working set of data: - Iterative Algorithms (ML, Graph Analysis) - Interactive Data Mining Tools (R, Python)
130
Problem with iteration in Hadoop Map-Reduce
1. Iteratively reload the data over and over 2. Redundantly save output between stages These steps result in repeated reading/writing from disk
131
Key objective for supporting iterative algorithms is what?
Keep working set in memory to perform quick operations. Load the dataset once into distributed memory
132
What is the challenge to keeping data in memory?
Designing a distributed memory abstraction that is both fault-tolerant and efficient
133
Resilient Distributed Datasets (RDDs)
Balance between granularity of the computation and the efficiency for enabling fault tolerance. Help to provide fault-tolerant and efficient solution
134
RDDs lead to efficient fault recovery, how?
Using lineage. Root RDD -> transformation applied -> Dervied RDD Generated Log one operation to apply to many elements Recompute lost partitions on failure No cost if nothing fails (everything is in memory)
135
RDD Recovery
Can selectively reload the failed input data to refresh memory or iteration data and run that subset of the data to recover.
136
Spark Stack
Spark Core - RDD
137
Spark Programming Interface
138
Map vs Flatmap
Map = list of lists Flatmap = list of flattened list (single list)
139
Cost of RDD Transformations (union, intersection, map, subtract)
-union - cheap -intersection - expensive (sorting) -map - cheap -subtract - expensive (distinct elements and set difference operation)
140
Spark Operations
141
Shared Variable in Spark
Broadcast Variable - Allows the program to efficiently send a large, read-only value to all worker nodes
142
Fault Tolerance of Spark
RDDs track lineage information that can be used to efficiently reconstruct lost partitions. Can be used to recompute efficiently in the event of a loss
143
Health Data Standards
1. LOINC (logical observations identifiers names and codes) for LABS 2. ICD (international classification of disease) for dx 3. CPT (current procedural terminology) for procedure 4. NDC (national drug code) for meds
144
Most popular medical ontology
SNOMED - systemized nomenclature of medicine
145
UMLS
Unified Medical Language System
146
ICD Codes
- From WHO - Categorize diseases - ICD 10 - Covers Dx and procedure
147
ICD 9 Codes
- [E/V/n x x] . [ y1 y2] a. x - category (17 categories + supplemental categories) b. y1- subcategory c. y2 - subclassification 3 to 5 digits
148
ICD 10 Codes
- [ x x x ] . [ y1 y2 y3 y4] a - x - category b - y1 - etiology c - y2 - body part d - y3 - severity/vital details e - y4 - extension
149
ICD9 to ICD10 Mapping
One-to-many relationships (ICD10 is more specific) but may be one-to-one occasionally
150
CPT
Current Procedure Terminology - medical/surgical/diagnostic services. Maintained by the AMA Used by insurance to determine how much to pay - Category I (5 digits) - Category II (4 digits + F for quality metrics) - Category III (4 digits + T for experimental use)
151
LOINC
A standard for lab and clinical observation created by Regenstrief Institute - Used to capture lab tests
152
NDC
National Drug Code Medication Standard maintained by FDA Used through drug supply chain to track medications 3 parts company/labeler - product code - package code
153
SNOMED
Comprehensive, multi-lingual clinical healthcare terminology Maintained by IHT SDO (non-profit in Denmark) Encode health information and support effective clinical recording of data Purpose of SNOMED- improve clinical docs, understand semantic interop, enable clinical decision support, data retrieval
154
Logical Model of SNOMED CT
155
UMLS
Unified Medical Language System Maintained by national library of medicine Integrates all data standards Software tools to map data to medical concepts 3 Sources: Metathesaurus Concepts Semantic Network Specialist Lexicon and Tools
156
PageRank
- Algorithm developed for ranking of web pages - Nodes with more incoming edges are higher ranked and more important -
157
MapReduce PageRank
158
MapReduce PageRank - Map Phase
159
MapReduce PageRank - Reduce Phase
160
Spectral Clustering
1. Construct a graph (patient vectors) 2. Create a similarity graph of patients 3. Store graph as matrix (adjacency matrix) 4. Find Top K Eigenvectors of Graph 5. Cluster into K Groups of patients using eigenvectors
161
Similarity Graph Construction
162
E-Neighborhood Graph
- Connect patients within epsilon distance to each other
163
Fully Connected Graph
Similarity function w is Gaussian kernel (or radial basis function) . Use fully connected graph but parameterize edges differently (edge weights different)
164
Singular Value Decomposition (SVD)
165
Singular Value Decomposition Example
166
SVD Properties
167
168
Principal Component Analysis (PCA)
169
Sparsity problem with SVD
SVD destroys sparsity in the original data
170
CUR vs SVD
CUR maintains the sparsity of the original data
171
CUR Decomposition
Use actual rows and columns to form factorization matrices
172
CUR Algorithm
173
Tensor Factorization
174
Rank 1 Tensor
Outer product of a set of vectors (1 from each mode)
175
Example Phenotype
176
Phenotyping through Tensor Factorization
Factorize tensor as sum of Rank 1 Tensors (Rank 1 Approximation of input). Lambda corresponds to the importance of the phenotype
177
Canonical Decomposition & Parallel Factorization (CP Decomposition)
178
Phenotyping Process Using Tensor Factorization
179
Tensor Factorization vs Non-Matrix Factorization
Much more concise with Tensor Phenotypes vs. NMF Phenotypes
180
Benefits for Tensor Factorizations
1. Unsupervised - Multiple phenotypes can be discovered 2. Predictive - phenotypes can be used for predictive modeling
181
Traditional Paradigm (Evidence based medicines)
Medical decisions based on well-designed and conducted research. - Randomized clinical trials to test hypothesis - Successful hypothesis becomes evidence - Evidence becomes clinical guidelines - Clinicians apply guidelines in practice
182
New Paradigm (precision medicine)
- Pragmatic trials (Data-driven evidence) - patient similarity search (practice based evidence) - individualized recommendations - precision medicine
183
Randomized Clinical Trials (RCT)
Start w/ study population Two groups - current treatment (control/placebo) and new treatment group RCT compares groups for improved outcomes in treatment group
184
Pragmatic Trials
- Measure effectiveness of treatment in routine clinical practice - Do similarity search with patients related to current patient - Look at patient outcome and recommend treatment with best outcome for similar patients
185
Using patient similarity
- practice-based medicine (look for similar patients) - hypothesis with retrospective evidence (other patients) - randomized clinical trials (prospective study) - evidence generation - clinical guidelines - apply in practice
186
Patient Similarity Approaches
Distance Metric Learning - similarity between patients due to similarity of ground truth label and feature labels. Graph-based similarity learning - connect patients to regions in a disease network and find similarity.
187
Locally Supervised Metric Learning
188
Sigmoid Function
Activation Function
189
Tanh Function
190
Rectified Linear Function (ReLU)
191
Stochastic Gradient Descent
192
Forward Computation for a neuron
193
Backward computation for a neuron
194
Advantage of CNN model
sparse interactions, parameter sharing, and translational invariance
195
Dimension Calculation of Convolution Layers
196
Dimension Calculation of Pooling Layer
197
Number of calculations for Convolution Layers
198
Number of calculations for Pooling Layers
199
Number of calculations for fully-conntected layers
200
T/F: Most of the parameters in a model are in the Convolution layer but most of the operations (calculations) are in the fully-connected layer
False. The parameters are in the fully connected layer and the operations are in the fully connected layer
201
Problem with RNN
- Gradient can become very small over a long sequence - Standard RNN will have difficulty to remember state from early history