computational precision medicine Flashcards

1
Q

What can PCA be used for?

A

Visualization of multidimensional sample differences, Dimensionality reduction, Visualization of batch effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can the units of the axes in a PCA plot be?

A

Raw component scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the units of gene quantification in RNA sequencing?

A

Transcripts per million, Fragments per kilobase per million, Number of reads mapped to a given transcript

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the distribution of RNA sequencing counts?

A

Negative binomial distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of computational precision medicine?

A

Finding and evaluating new therapeutic targets, Supporting the practices of precision medicine with data analysis, Supporting of precision diagnostics by large-scale data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between Pearson’s and Spearman’s correlation coefficients?

A

Pearson’s correlation evaluates the linear relationship between two continuous variables, whereas Spearman’s correlation evaluates the monotonic relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are batch effects?

A

Technical variance that we want to remove, Batch effects can introduce unwanted systematic variation in the data, which is unrelated to the biological factors of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How many dimensions are produced using PCA?

A

same as the number of features in the orignal space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of the UCSC Xena project?

A

Enabling visual data analysis, A standardized pre-processing of mulitple omics data sets, A repository of gene expression data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is NOT a hallmark of cancer

A

Resistance to immunotherapy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do you log2 transform data and what is meant by “count+1”?

A

when the numbers are too big, we transform it to see everything, and count+1 is just because we can’t log2 to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is RPKM?

A

reads per kilobase per million reads mapped, normalizes for gene length and sequencing depth, if high RPKM then we have high expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is FPKM?

A

Fragments per kilo base of transcript per million mapped fragments, analouges to RPKM and used for paired end data, if high FPKM we have high expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is TPM?

A

Transcripts per million fragments, also normalizes for gene lenght and sequencing depth and is better suited to compare expression between samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you not do for either RPKM, FPKM and TPM?

A

You cannot use for differential gene expression (DESEeq2/edgeR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of molecular subtyping cancer samples?

A

To classify cancer samples into distinct subgroups based on molecular characteristics

17
Q

In K-nearest neighbors (KNN) classification, what does the value of K represent?

A

The number of neighbors considered for classification

18
Q

In distance to centroid classification, how is the class label of a new data point determined?

A

By assigning the class label of the centroid closest to the data point

19
Q

Which methods is commonly used approaches for molecular subtyping of cancer?

A

Immunohistochemistry (IHC), Next-generation sequencing (NGS), Gene expression profiling

20
Q

What is the primary objective of single sample gene set enrichment analysis (ssGSEA)?

A

To determine the functional enrichment of gene sets in a single sample

21
Q

What are batch effects in gene expression data?

A

Systematic variations in gene expression attributed to technical factors

22
Q

When comparing gene expression data of cohorts of patients, why is it important to consider biological confounders?

A

Biological confounders can affect the reliability of outcome, lead to biased interpretation and affect reliability of gene expression measurements

23
Q

How is the distance between two ranked vectors typically measured?

A

Kendall’s tau distance

24
Q

Why am I teaching you to work with microarray data when RNA seq provides a more comprehensive gene expression profiling?

A

A lot of cancer subtyping schemes are defined on microarray data