Final Exam Flashcards

1
Q

The Four V’s

A

Volume
Variety
Velocity
Veracity - A lot of noise/false alarms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What makes predictive modeling difficult?

A
  1. Millions of patients to analyze - dx, rx, etc.
  2. Many models to be built
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Computational Phenotyping

A

Raw data (demo, dx, rx, labs) -> phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Patient Similarity

A

Simulate doctor’s case-based reasoning with algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hadoop

A

Distributed disk-based big data system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spark

A

Distributed in-memory big data system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: Hadoop is much faster than spark

A

False. Spark is in-memory so is faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps of the predictive modeling pipeline

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prediction Target should be both ____ and ____

A

interesting and possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cohort Construction Study

A

Defining the study population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prospective vs. Retrospective

A

Prospective - identify cohort then collect data
Retrospective - Retrieve historical data then identify cohort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: A prospective study has more noise in the data than a retrospective study

A

False. Retrospective study has more noise in historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: A prospective study is more expensive than a retrospective study

A

True. The data collection has to be pre-planned for the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: A prospective study takes more time than a retrospective study

A

True. The data collection has to be planned and executed before analysis of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: A prospective study more commonly involves a larger dataset than a retrospective study.

A

False. A retrospective study more often involves a large dataset because historical data can be accessed more easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cohort Study

A

The goal is to selected a group of patients who are exposed to a risk.

Example: Target is heart failure readmission. The Cohort contains all HF patients discharged from hospital. The key in a cohort study is to define the right inclusion/exclusion criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Case-Control Study

A

Identify two sets of patients - cases and controls. Put the case patients and control patients together to define the cohort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Case in Case-Control study

A

Patients with positive outcome (have disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Control in Case-Control study

A

Patient with negative outcome (healthy) but otherwise similar to the case patients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Feature Construction Goal

A

Construct all potentially relevant features about patients in order to predict the target outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example components of a Feature Construction pipeline

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A

Large observation window and short prediction window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
A

Small observation window and large prediction window. This is the most useful model but most likely unrealistic and difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A

Curve B because it can predict accurately for a longer period of time while the performance drops quickly for the other models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
A

C, 630 days. The performance plateaus beyond that point. There is a trade-off between how long the observation window is and how many patients have enough data for the longer window.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Goal of Feature Selection

A

Find the truly predictive features to be included in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

T/F: Training error is not very useful

A

True - Training data is prone to overfit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Leave one out cross validation

A

Take one example at a time as validation set, use remaining set as training. Repeat the process. Final performance is average predictive performance across all iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

K Fold Cross Validation

A

Similar to leave one out cross validation except K items are left for validation, resulting in the dataset being split into K chunks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Randomized Cross Validation

A

Randomly split dataset into train and validation. Model is fit to training data and accuracy is assessed using validaiton. Results are validated over all the splits. Advantage over K-Fold - proportion of the training and validation split does not depend on number of folds. Disadvantage - some observations may never be selected into validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is hadoop mapreduce?

A
  • Programming Model
  • Execution Environment - Hadoop is Java impl
  • Software package - tools developed to facilitate data science tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Hadoop provides what capabilities

A
  • Distributed Storage - file system
  • Distributed Computation - mapReduce
  • Fault Tolerance - for sys failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Computational Process of Hadoop

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Fundamental pattern of writing algorithm using Hadoop is to specify algorithm as ____

A

aggregation statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

First stage of MapReduce System

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

First stage of MapReduce System

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Second stage of MapReduce System

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Final Stage of MapReduce System

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

In what way is MapReduce designed to minimize re-computation

A

When a component fails only the specific component is re-computed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is HDFS?

A

The back-end file system to store all the data to process using the MapReduce paradigm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are limitations of MapReduce?

A
  1. Cannot directly access data (must use map/reduce and aggregation query)
  2. Logistic Regression not easy to implement in map reduce - due to iterative batch gradient descent approach. Iteration requires load of data twice for each iteration.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

MapReduce KNN

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

True Positive

A

Prediction Outcome Positive & Condition Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

False Positive

A

Prediction Outcome Positive & Condition Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

False Negative

A

Prediction Outcome Negative & Condition Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

True Negative

A

Prediction Outcome Negative & Condition Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Type I Error

A

False Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Type II Error

A

False Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Accuracy

A

TP + TN / Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

True Positive Rate

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

False Positive Rate

A

FP / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

False Negative Rate

A

FN / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

True Negative Rate

A

TN / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Sensitivty

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Recall

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Specificity

A

TN / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Prevalence

A

Condition Positive (TP + FN) / Total Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Positive Predictive Value

A

TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

False Discovery Rate

A

FP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

False Omission Rate

A

FN / (FN + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Negative Predictive Value

A

TN / (FN + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

F1 Score

A

2 * [ (Precision * Recall) / (Precision + Recall) ]
Harmonic mean of Precision and Recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What does the ROC curve do?

A

Illustrates overall performance of a classifier when varying the threshold value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What is the AUC?

A

A performance metric that does not depend on threshold value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Regression Metrics (MSE & MAE)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Regression Metric that can be used across datasets

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Gradient Descent Method

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Gradient Descent Method for Linear Regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Stochastic Gradient Descent

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

SGD for Linear Regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Steps of Ensemble Methods

A
  1. Generate a set of datasets (independently in bagging or sequentially in boosting)
  2. Each dataset is used to train a separate model (can be independently trained models)
  3. Aggregation function F (avg or weighted avg)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Bias Variance Tradeoff

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Bagging

A

Take repeated samples of a dataset to create subsamples (with replacement), train separate models, then classify data point by taking majority vote of the models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Random Forest

A
  • Create multiple simple trees for models and generate an average
  • Simple algorithms help with computational cost
  • Simple algorithms works better
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Why does bagging work?

A

Reduces variance without increasing Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Boosting

A

Incrementally building models one at a time
Based on mistakes and misclassifications create a subsequent model (better)
repeat process over and over
Final mode is weighted average
(May be better than bagging but more likely to overfit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q
A
81
Q

Pros of Ensemble Methods

A
82
Q

Cons of Ensemble Methods

A
83
Q

Computational Phenotyping

A

Converting Raw Data (demographics, dx, meds, labs) into phenotyping then medical concepts (phenotypes ex. Diabetes I)

84
Q

Applications of Phenotyping

A
  1. Genomic Studies
  2. Clinical Predictive Modeling
  3. Pragmatic Clinical Trials
  4. Healthcare Quality Measurements
85
Q

Genomic Wide-Association Study (GWAS)

A

Approach that involves scanning biomarkings from single nucleotide polymorphisms (SNPS) from DNA of many people to find genetic association for specific phenotypes

86
Q

How to run a Genomic Wide Association Study

A
  1. Identify the phenotypes
  2. Group patients into case and control
  3. Get DNA samples from all patients
    For each SNP:
  4. Compute frequency of SNP (single-nucleotide polymorphism) on cases and controls
  5. Compute odds ratio
  6. Compute p-value. If p-value is small, conclude the SNP is significant
87
Q

Why do we care about phenotyping?

A

Rich and deep phenotypic data is needed to analyze genomic data. Cost of phenotypic data is increasing while cost of genomic data is decreasing

88
Q

Clinical Predictive Modeling

A

Start with raw EHR data -> predictive model -> model

89
Q

Why is predictive modeling from raw data not ideal?

A
  • noise in raw data
  • complex/high-dimensional raw data
  • model is tied to raw data so cannot be adapted from one hospital to another
90
Q

Pragmatic Clinical Trials differ from traditional trials how?

A
91
Q

Main phenotyping methods

A
  1. Supervised Learning
    a. expert defined rules (popular)
    b. classification
  2. Unsupervised learning (clustering) - does not take as much time but missing ground truth. Needs large amount of training data.
    a. dimensionality reduction
    b. tensor factorization
92
Q

Which phenotyping approach requires more human effort during evaluation?
1. expert-defined rules
2. classification models

A

classification models

93
Q

Which phenotyping approach is easier to interpret?
1. expert-defined rules
2. classification rules

A

expert-defined rules because they use clinical intuition and knowledge.

94
Q

Healthcare Applications of clustering

A
  1. Patient stratification - group patients into clusters
  2. Disease Hierarchy discovery - learning hierarchy between diseases and how they relate
  3. phenotyping - data to concepts
95
Q

Classical Clustering Algorithms include

A

K-Means, Hierarchical Clustering, Gaussian Mixture Modeling

96
Q

Scalable Clustering Algorithms include

A

Mini batch K-Means, DBScan

97
Q

K-Means Algorithm

A
98
Q
A

N * k * d * i
- n data points to k centers with d dimensions run i times

99
Q

Hierarchical Clustering

A
100
Q

Which Hierarchical Clustering approach is more efficient? Agglomerative or Divisive?

A

Agglomerative

101
Q

Algorithm for Agglomerative Clustering

A
102
Q

Example of a soft clustering method

A

Gaussian Mixture Model (GMM)

103
Q

What is the equation for the Gaussian Mixture Model (GMM?)

A
104
Q

GMM uses which popular optimization strategy?

A

Expectation Maximization (EM)

105
Q

GMM Expectation Maximization Algorithm

A
106
Q

GMM Initialization

A

Need to initialize mixing coefficient (pi_k), centers (mu_k), and variance (sigma_k)
- good initialization can lead to faster convergence
- bad initialization can fail to converge

107
Q

How to improve GMM initialization

A

Use K-Means result to initialize for GMMs. Center (mu_k) can be the center from k-means result and the covariance (sigma_k) can be computed from all the data points in the clusters and the mixing coefficient pi_k can be the size of cluster k / n data points

108
Q

GMM Expectation Step

A
109
Q

GMM Maximization Step

A
110
Q

K-Means vs. GMM (clustering, parameters, algorithm)

A
111
Q

Mini Batch K-Means Benefit

A

With large dataset, it allows streaming data and using mini-batches rather than the full dataset

112
Q

Mini Batch K-Means Algorithm

A
113
Q

Mini batch K-Means:
How to assign points in M (mini-batch) to current center in C

A

Save center in hash map to retrieve quickly

114
Q

Mini batch K-Means:
How to update C center based on assignments in M (mini batch)

A

center update becomes smaller and smaller (stabilizes) as the step size decreases

115
Q
A

t * b * K * d

116
Q

DBScan

A

Density-Based Spatial Clustering of Applications with Noise
- clusters defined as areas of high density separated by areas of low density

117
Q

T/F: The clusters found by DB scan are typically oval shape

A

False - they can be any shape

118
Q

DBScan Key Concepts

A
  • Density = # of points within epsilon to point p
  • Point is in a dense region if density is greater than a threshold
  • Core - Points in the dense region
  • Border - Points within epsilon distance from a core point
  • Noise - Points outside of epsilon distance to a core point
119
Q

DBScan Algorithm

A
120
Q

How many clusters can a datapoint belong to using the DBScan algorithm?

A

0 or 1

121
Q

Clustering Evaluation Metrics

A
  1. Rand Index - requires ground truth
  2. Mutual Information - requires ground truth
  3. Silhouette Coefficient - no ground truth required
122
Q

Rand Index (RI)

A
123
Q

Mutual Information

A

Measures the mutual dependency of two random variables (from information theory).

124
Q

Pros/Cons of Rand Index and Mutual Information

A
125
Q

Silhouette Coefficient

A
126
Q

Pros/Cons of Silhouette Coefficient

A
127
Q

Which better supports iterative ML Algorithms: MapReduce or Spark

A

Spark

128
Q

___ is based on acyclic data flow from stable storage to stable storage

A

Hadoop

129
Q

Limitations of Hadoop

A

Inefficient for applications that repeatedly reuse a working set of data:
- Iterative Algorithms (ML, Graph Analysis)
- Interactive Data Mining Tools (R, Python)

130
Q

Problem with iteration in Hadoop Map-Reduce

A
  1. Iteratively reload the data over and over
  2. Redundantly save output between stages

These steps result in repeated reading/writing from disk

131
Q

Key objective for supporting iterative algorithms is what?

A

Keep working set in memory to perform quick operations.
Load the dataset once into distributed memory

132
Q

What is the challenge to keeping data in memory?

A

Designing a distributed memory abstraction that is both fault-tolerant and efficient

133
Q

Resilient Distributed Datasets (RDDs)

A

Balance between granularity of the computation and the efficiency for enabling fault tolerance. Help to provide fault-tolerant and efficient solution

134
Q

RDDs lead to efficient fault recovery, how?

A

Using lineage.
Root RDD -> transformation applied -> Dervied RDD Generated
Log one operation to apply to many elements
Recompute lost partitions on failure
No cost if nothing fails (everything is in memory)

135
Q

RDD Recovery

A

Can selectively reload the failed input data to refresh memory or iteration data and run that subset of the data to recover.

136
Q

Spark Stack

A

Spark Core - RDD

137
Q

Spark Programming Interface

A
138
Q

Map vs Flatmap

A

Map = list of lists
Flatmap = list of flattened list (single list)

139
Q

Cost of RDD Transformations (union, intersection, map, subtract)

A

-union - cheap
-intersection - expensive (sorting)
-map - cheap
-subtract - expensive (distinct elements and set difference operation)

140
Q

Spark Operations

A
141
Q

Shared Variable in Spark

A

Broadcast Variable - Allows the program to efficiently send a large, read-only value to all worker nodes

142
Q

Fault Tolerance of Spark

A

RDDs track lineage information that can be used to efficiently reconstruct lost partitions. Can be used to recompute efficiently in the event of a loss

143
Q

Health Data Standards

A
  1. LOINC (logical observations identifiers names and codes) for LABS
  2. ICD (international classification of disease) for dx
  3. CPT (current procedural terminology) for procedure
  4. NDC (national drug code) for meds
144
Q

Most popular medical ontology

A

SNOMED - systemized nomenclature of medicine

145
Q

UMLS

A

Unified Medical Language System

146
Q

ICD Codes

A
  • From WHO
  • Categorize diseases
  • ICD 10
  • Covers Dx and procedure
147
Q

ICD 9 Codes

A
  • [E/V/n x x] . [ y1 y2]
    a. x - category (17 categories + supplemental categories)
    b. y1- subcategory
    c. y2 - subclassification

3 to 5 digits

148
Q

ICD 10 Codes

A
  • [ x x x ] . [ y1 y2 y3 y4]
    a - x - category
    b - y1 - etiology
    c - y2 - body part
    d - y3 - severity/vital details
    e - y4 - extension
149
Q

ICD9 to ICD10 Mapping

A

One-to-many relationships (ICD10 is more specific) but may be one-to-one occasionally

150
Q

CPT

A

Current Procedure Terminology - medical/surgical/diagnostic services.

Maintained by the AMA
Used by insurance to determine how much to pay

  • Category I (5 digits)
  • Category II (4 digits + F for quality metrics)
  • Category III (4 digits + T for experimental use)
151
Q

LOINC

A

A standard for lab and clinical observation created by Regenstrief Institute

  • Used to capture lab tests
152
Q

NDC

A

National Drug Code
Medication Standard maintained by FDA
Used through drug supply chain to track medications

3 parts
company/labeler - product code - package code

153
Q

SNOMED

A

Comprehensive, multi-lingual clinical healthcare terminology

Maintained by IHT SDO (non-profit in Denmark)

Encode health information and support effective clinical recording of data

Purpose of SNOMED- improve clinical docs, understand semantic interop, enable clinical decision support, data retrieval

154
Q

Logical Model of SNOMED CT

A
155
Q

UMLS

A

Unified Medical Language System
Maintained by national library of medicine
Integrates all data standards
Software tools to map data to medical concepts

3 Sources:
Metathesaurus Concepts
Semantic Network
Specialist Lexicon and Tools

156
Q

PageRank

A
  • Algorithm developed for ranking of web pages
  • ## Nodes with more incoming edges are higher ranked and more important
157
Q

MapReduce PageRank

A
158
Q

MapReduce PageRank - Map Phase

A
159
Q

MapReduce PageRank - Reduce Phase

A
160
Q

Spectral Clustering

A
  1. Construct a graph (patient vectors)
  2. Create a similarity graph of patients
  3. Store graph as matrix (adjacency matrix)
  4. Find Top K Eigenvectors of Graph
  5. Cluster into K Groups of patients using eigenvectors
161
Q

Similarity Graph Construction

A
162
Q

E-Neighborhood Graph

A
  • Connect patients within epsilon distance to each other
163
Q

Fully Connected Graph

A

Similarity function w is Gaussian kernel (or radial basis function) . Use fully connected graph but parameterize edges differently (edge weights different)

164
Q

Singular Value Decomposition (SVD)

A
165
Q

Singular Value Decomposition Example

A
166
Q

SVD Properties

A
167
Q
A
168
Q

Principal Component Analysis (PCA)

A
169
Q

Sparsity problem with SVD

A

SVD destroys sparsity in the original data

170
Q

CUR vs SVD

A

CUR maintains the sparsity of the original data

171
Q

CUR Decomposition

A

Use actual rows and columns to form factorization matrices

172
Q

CUR Algorithm

A
173
Q

Tensor Factorization

A
174
Q

Rank 1 Tensor

A

Outer product of a set of vectors (1 from each mode)

175
Q

Example Phenotype

A
176
Q

Phenotyping through Tensor Factorization

A

Factorize tensor as sum of Rank 1 Tensors (Rank 1 Approximation of input).

Lambda corresponds to the importance of the phenotype

177
Q

Canonical Decomposition & Parallel Factorization (CP Decomposition)

A
178
Q

Phenotyping Process Using Tensor Factorization

A
179
Q

Tensor Factorization vs Non-Matrix Factorization

A

Much more concise with Tensor Phenotypes vs. NMF Phenotypes

180
Q

Benefits for Tensor Factorizations

A
  1. Unsupervised - Multiple phenotypes can be discovered
  2. Predictive - phenotypes can be used for predictive modeling
181
Q

Traditional Paradigm (Evidence based medicines)

A

Medical decisions based on well-designed and conducted research.
- Randomized clinical trials to test hypothesis
- Successful hypothesis becomes evidence
- Evidence becomes clinical guidelines
- Clinicians apply guidelines in practice

182
Q

New Paradigm (precision medicine)

A
  • Pragmatic trials (Data-driven evidence)
  • patient similarity search (practice based evidence)
  • individualized recommendations
  • precision medicine
183
Q

Randomized Clinical Trials (RCT)

A

Start w/ study population
Two groups - current treatment (control/placebo) and new treatment group
RCT compares groups for improved outcomes in treatment group

184
Q

Pragmatic Trials

A
  • Measure effectiveness of treatment in routine clinical practice
  • Do similarity search with patients related to current patient
  • Look at patient outcome and recommend treatment with best outcome for similar patients
185
Q

Using patient similarity

A
  • practice-based medicine (look for similar patients)
  • hypothesis with retrospective evidence (other patients)
  • randomized clinical trials (prospective study)
  • evidence generation
  • clinical guidelines
  • apply in practice
186
Q

Patient Similarity Approaches

A

Distance Metric Learning - similarity between patients due to similarity of ground truth label and feature labels.

Graph-based similarity learning - connect patients to regions in a disease network and find similarity.

187
Q

Locally Supervised Metric Learning

A
188
Q

Sigmoid Function

A

Activation Function

189
Q

Tanh Function

A
190
Q

Rectified Linear Function (ReLU)

A
191
Q

Stochastic Gradient Descent

A
192
Q

Forward Computation for a neuron

A
193
Q

Backward computation for a neuron

A
194
Q

Advantage of CNN model

A

sparse interactions, parameter sharing, and translational invariance

195
Q

Dimension Calculation of Convolution Layers

A
196
Q

Dimension Calculation of Pooling Layer

A
197
Q

Number of calculations for Convolution Layers

A
198
Q

Number of calculations for Pooling Layers

A
199
Q

Number of calculations for fully-conntected layers

A
200
Q

T/F: Most of the parameters in a model are in the Convolution layer but most of the operations (calculations) are in the fully-connected layer

A

False. The parameters are in the fully connected layer and the operations are in the fully connected layer

201
Q

Problem with RNN

A
  • Gradient can become very small over a long sequence
  • Standard RNN will have difficulty to remember state from early history