Introduction Class 2 Flashcards

1
Q

What are three main phases needed before a prediction model can be used in clinical practise?

A
  1. Development (7 Steps of development)
  2. Research question and initial data inspection
  3. Coding of predictors
  4. Model specification
  5. Model estimation
  6. Evaluation of model performance
  7. Internal validation
  8. Model presentation.
  9. External Validation (Completely new data set)
  10. Impact assessment (Clinical usefulness)

(Steyerberg et al. 2014 Eur Heart J.2014 Aug 1;35(29):1925-31.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Prediction models are relevant to many questions in clinical medicine, public health, and epidemiology.

What are some examples?

A

Public health:
Identifying target populations for preventive interventions (Qrisk and Qdiabetes)

Clinical practice:
Therapeutic decision making: Should a treatment start? Which treatment is the best? How intense should it be (e.g. drug dose)?

Management decision:
Do we need more hospital beds? How cost-effective will be a treatment?
Providing realistic expectations of the course of the disease for patients and their relatives

Medical research:
In experimental trials (RCTs) predictive baseline characteristics can help to include or stratify patients and improve statistical analyses, e.g. stratification by biomarkers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prediction research asks different question than explanatory medical research.

What is this?

A

How can we reliably predict outcomes of individuals?
-> Prediction models predict outcomes for individuals.

Theoretical models are not necessary and causal interpretation are not of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the aim of prediction modelling?

A

To find a model with an appropriate subset of predictor variables, which shows good generalizability: good prediction of future observations

Often many predictors are available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine learning is used for?

A

To analyse large numbers of predictors to get a reliable prediction for a person!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

P values and confidence intervals are of no interest in Machine learning.

True or false?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prediction modelling aims to make average predictions

True or false

A

FALSE

Prediction modelling makes to make individual predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between explanatory and predictive research?

A

Explanatory research:

Applies statistical models, such as regression, to test causal hypotheses using a priory theoretical models.

Typically explanatory research is interested for an “average” response of a population.
Causal interpretation is ultimate aim

Prediction research asks differently:

  • What is the likelihood of individual events or outcomes: Prediction models predict outcomes for individuals.
  • Theoretical models are not necessary and causal interpretation are not main interest (“Black box”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are prediction models optimized for the purpose of?

A

Predicting new or future observations, while in explanatory research minimizing the bias (difference between estimated and true population parameter) is the key criterion to select a best model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is there tension between explanatory and predictive modelling?

A

The best explanatory model may differ from the best predictive model (Sober 2006).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are problems with analysing big data?

A

With a small number of variables normal statistical methods can be applied

However, in times of BIG DATA they are not sufficient anymore

Often the number of potential predictors is large compared to sample size (p»n problem)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a byte?

A

A unit of data that is eight binary digits long and used in computers to represent a character such as a letter, number or typographic symbol.
1981: Intel 8088 PC had 640 000 bytes (640 kbyte) memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What adds to the issue of analysing big data?

A

Volume of data is (still ) increasing exponentially!
from 130 Exabyte’s (Exabyte = 1018 or 1000 000 000 000 000 000 bytes of data) in 2005 to estimated 44 zettabytes (1021) in 2020
equivalent to a stack of DVDs from earth to halfway to Mars or as many digital bytes as there are stars
But less than 1% is analysed
(source: http://www.idc.com)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is multi morbidity prediction important?

A

Diseasesare caused by a combination of genetic, environmental, and lifestyle factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are three types of data?

A

Structured data: SQL database format (10%)

Semi-structured Data (XML) (10%): tables, excel files

Unstructured data (80% of all data): Text und multimedia data, including emails, patient records – often handwritten, social media, audio, photos, webpages, presentations, documents satellite, streaming data from sensors (wearables), social network data ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big data sets are usually designed for what?

A

To re-use for many purposes and to answer multiple questions, often not known when database created

17
Q

What is big data typically analysed with?

A

Machine learning

18
Q

Machine learning approaches are used for what?

A

Prediction models, such as regularized regressions, random forests, support vector machines, deep learning…..

19
Q

What does machine learning (ML) explore?

A

The study and construction of algorithmsthat canlearn from and make predictions ondata (Mund 2013)
-> Machine learning is concerned with prediction and automated model building

– Do not need to develop a theoretical model, machine can identify model itself

20
Q

Machine learning algorithms are optimized for what purpose?

A

Predicting new or future observations

21
Q

What does machine learning often treat the description of the relationship between predictor variables and outcome often as?

A

A black box (Blackbox machine learning refers to machine learning models thatgive you a result or reach a decision without explaining or showing how they did so)

22
Q

What does the google flu trend machine learning highlight after it failed?

A

Correlation is not causation and theory is still useful! - Failed as people used different search terms for flue

23
Q

Is Machine learning the best approach in clinical field?

A

No

The development of prognostic models on the basis of priori clinical knowledge is often comparable or better than using data-driven model building using machine-learning methods

24
Q

When should machine-learning methods be preferred for?

A

High dimensional data when no a priori knowledge is available.

Machine-learning is no substitute for small data sets!

25
Q

What are problems with prediction modelling?

A
  1. Study design and confounders
  2. Validation
  3. Clinical usefulness
  4. Missing data
26
Q

What is one argument as to why we can ignore problem of confounders in case-control or other observational studies when concerned with prediction?

A

We are not explaining whether predictor are predictive as they are directly related to outcome or if they are a proxy/potential confounder thus we should be unconcerned as to whether an association that we selected for prediction is causal

27
Q

Why should confounders be considered in prediction modelling?

A

Generalization/portability of findings to another setting may depend upon pattern of confounders remaining the same!

This needs explicit consideration and often revised analysis

28
Q

What is the significance of study design and target design in prediction modelling?

A

We need to distinguish the sample from which prediction performance is estimated from that in which it is intended to be used (target).

Only some parameters are insensitive to sample design!

29
Q

Often our preferred measure of prediction performance is what?

What must in turn be considered?

A

A parameter which is design sensitive : a model predicting treatment outcome based on a sample of SE London may predict less well in Scotland.

Can we generalize to other populations? Does our model show good external validity?

30
Q

Good prediction accuracy may not mean what?

A

Clinical usefulness!

31
Q

In addition to positive and negative predictive values we need to consider what?

A

The costs and benefits of a test (for patient, family, clinicians, general public). Economic costs may determine if its worth creating a prediction model e.g is the outcome affordable

32
Q

What is required to evaluate impact of prediction models on patient outcomes i.e. clinical trial
?

A

Intervention research

33
Q

Why is missing data problematic when creating prediction models?

A

Similar problems as in classical statistics: they introduce bias and reduce sample size and precision

Standard treatments for missing data, like multiple imputation, may be difficult to incorporate - especially if machine learning is incorporated