Guest Lecture: Big Data Flashcards

1
Q

What is big data in health?

A

Big data in health encompasses high volume, high diversity, biological, clinical, environmental, and lifestyle information collected from single individuals to large cohorts, in relation to their health and wellness status, at one or several time points.

EX: electronic health records (EHR)
mammography
gene data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How did the emerging of big data changes where and how we collect data?

A
  • clinical trials
  • EHR
  • patient registries and databases
  • multidimensional data form genomic, epigenomic, transcriptomic, proteomics, metabolomics, and microbiomics (OMICS)
  • medical imaging

more recently

  • social media
  • socioeconomic or behavioural indicators
  • mobile applications
  • environmental monitoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are OMICS?

A

genomic/epigenomic/transcriptomic - large scale study of genes/epigenetic modifications/RNA

proteomics - large-scale study of proteins

metabolomics - large-scale study of metabolites

microbiomics - large scale study of genes of microbiota

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why we need to integrate big data in health science?

A

“Big data in health can be used to improve the efficiency and effectiveness of prediction and prevention strategies or of medical interventions, health services, and health policies.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we make sense of and use big data?

A

Machine learning (“black box”) –> clinical trial

Biology based model –> use big data to understand the association/mechanism in the biological system –> clinical trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is machine learning?

A

“Machine learning is the science (and art) of programming computers so they can learn from data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the hierarchy of evidence?

A
animal and lab studies 
case report or case series 
case control studies
cohort studies 
randomized controlled trials 
systematic review 
meta analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where are fabp2 and fabp6 located?

A

in the small intestine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the objective of Yiheng’s study?

A

Analyze of sex-specific gene expression programs using Fabp gene disrupted mice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an microarray?

A

A DNA microarray (also commonly known as DNA chip or biochip) has a collection of microscopic DNA spots attached to a solid surface, using to survey and anneal target cDNA sequence in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the p value?

A

p (significance level):
o probability of rejecting the null hypothesis when the null is true
o (whether the difference exist because two groups are really different instead of due to chance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are multiple testing issues?

A

Omics is high-dimensional data –> 100s ~100000s of variables
• Lots of hypothesis tests
• Performing t-tests on the microarray data might result in performing more than 20000 separate hypothesis tests.
• If we use a standard p value cut-off of 0.05, we would see 1000 (20000*0.05) genes to be recognized as “significant” by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the ways of multiple testing correction?

A

Family Wise Error Rate (FWER) - e.g. Bonferroni

False Discovery Rate (FDR) - e.g. Benjamini-Hochberg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the Family Wise Error Rate (FWER) - e.g. Bonferroni

A

Using “corrected p.value=p.value/n” (p.value =0.05; n=number of genes in the list)

For example, I test 20,000 genes at a time, the highest accepted individual p value is 0.0000025, make the correction very stringent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the False Discovery Rate (FDR)

A

A FDR of 0.05 means that 5% among the significant genes are expected to be false positive

For example, 100 genes are identified as DE genes, 5 of them will be false positive.

By controlling the FDR, we can control the expected proportion of “discoveries” (rejected H0) that are false (incorrect
rejections)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are functional annotations?

A

“the process of collecting information about and describing a gene’s biological identity—its various aliases, molecular function, biological role(s), subcellular location, and its expression domains”

17
Q

What are KEGG pathways?

A

Kyoto Encyclopedia of Genes and Genomes

18
Q

What is personalized nutrition?

A
  • Trying to integrate all the information that could influence nutritional response. Bacteria and how that can affect our nutrient metabolism, etc.
  • Used information from the microbiome, blood, and questionnaires on family history, lifestyle information, anthropometrics and food diary to generate functions and equations in the blackbox to predict how these individuals to respond to a certain diet.
19
Q

What is GWAS?

A

Genome wise association studies

20
Q

What gene is FGF21?

A

sweet preference test