Final Exam Flashcards
(202 cards)
The Four V’s
Volume
Variety
Velocity
Veracity - A lot of noise/false alarms
What makes predictive modeling difficult?
- Millions of patients to analyze - dx, rx, etc.
- Many models to be built
Computational Phenotyping
Raw data (demo, dx, rx, labs) -> phenotypes
Patient Similarity
Simulate doctor’s case-based reasoning with algorithms
Hadoop
Distributed disk-based big data system
Spark
Distributed in-memory big data system
T/F: Hadoop is much faster than spark
False. Spark is in-memory so is faster
What are the steps of the predictive modeling pipeline
Prediction Target should be both ____ and ____
interesting and possible
Cohort Construction Study
Defining the study population
Prospective vs. Retrospective
Prospective - identify cohort then collect data
Retrospective - Retrieve historical data then identify cohort
T/F: A prospective study has more noise in the data than a retrospective study
False. Retrospective study has more noise in historical data
T/F: A prospective study is more expensive than a retrospective study
True. The data collection has to be pre-planned for the study
T/F: A prospective study takes more time than a retrospective study
True. The data collection has to be planned and executed before analysis of the data.
T/F: A prospective study more commonly involves a larger dataset than a retrospective study.
False. A retrospective study more often involves a large dataset because historical data can be accessed more easily
Cohort Study
The goal is to selected a group of patients who are exposed to a risk.
Example: Target is heart failure readmission. The Cohort contains all HF patients discharged from hospital. The key in a cohort study is to define the right inclusion/exclusion criteria.
Case-Control Study
Identify two sets of patients - cases and controls. Put the case patients and control patients together to define the cohort.
Case in Case-Control study
Patients with positive outcome (have disease)
Control in Case-Control study
Patient with negative outcome (healthy) but otherwise similar to the case patients
Feature Construction Goal
Construct all potentially relevant features about patients in order to predict the target outcome
Example components of a Feature Construction pipeline
Large observation window and short prediction window
Small observation window and large prediction window. This is the most useful model but most likely unrealistic and difficult
Curve B because it can predict accurately for a longer period of time while the performance drops quickly for the other models