computational precision medicine Flashcards

Question 1

Q

What can PCA be used for?

Answer

A

Visualization of multidimensional sample differences, Dimensionality reduction, Visualization of batch effects

Question 2

Q

What can the units of the axes in a PCA plot be?

Answer

A

Raw component scores

Question 3

Q

What are the units of gene quantification in RNA sequencing?

Answer

A

Transcripts per million, Fragments per kilobase per million, Number of reads mapped to a given transcript

Question 4

Q

What is the distribution of RNA sequencing counts?

Answer

A

Negative binomial distribution

Question 5

Q

What is the purpose of computational precision medicine?

Answer

A

Finding and evaluating new therapeutic targets, Supporting the practices of precision medicine with data analysis, Supporting of precision diagnostics by large-scale data analysis

Question 6

Q

What is the difference between Pearson’s and Spearman’s correlation coefficients?

Answer

A

Pearson’s correlation evaluates the linear relationship between two continuous variables, whereas Spearman’s correlation evaluates the monotonic relationship

Question 7

Q

What are batch effects?

Answer

A

Technical variance that we want to remove, Batch effects can introduce unwanted systematic variation in the data, which is unrelated to the biological factors of interest.

Question 8

Q

How many dimensions are produced using PCA?

Answer

A

same as the number of features in the orignal space

Question 9

Q

What is the purpose of the UCSC Xena project?

Answer

A

Enabling visual data analysis, A standardized pre-processing of mulitple omics data sets, A repository of gene expression data

Question 10

Q

What is NOT a hallmark of cancer

Answer

A

Resistance to immunotherapy

Question 11

Q

Why do you log2 transform data and what is meant by “count+1”?

Answer

A

when the numbers are too big, we transform it to see everything, and count+1 is just because we can’t log2 to zero

Question 12

Q

What is RPKM?

Answer

A

reads per kilobase per million reads mapped, normalizes for gene length and sequencing depth, if high RPKM then we have high expression

Question 13

Q

What is FPKM?

Answer

A

Fragments per kilo base of transcript per million mapped fragments, analouges to RPKM and used for paired end data, if high FPKM we have high expression

Question 14

Q

What is TPM?

Answer

A

Transcripts per million fragments, also normalizes for gene lenght and sequencing depth and is better suited to compare expression between samples

Question 15

Q

What can you not do for either RPKM, FPKM and TPM?

Answer

A

You cannot use for differential gene expression (DESEeq2/edgeR)

Question 16

Q

What is the purpose of molecular subtyping cancer samples?

Answer

Study These Flashcards

A

To classify cancer samples into distinct subgroups based on molecular characteristics

Question 17

Q

In K-nearest neighbors (KNN) classification, what does the value of K represent?

Answer

Study These Flashcards

A

The number of neighbors considered for classification

Question 18

Q

In distance to centroid classification, how is the class label of a new data point determined?

Answer

Study These Flashcards

A

By assigning the class label of the centroid closest to the data point

Question 19

Q

Which methods is commonly used approaches for molecular subtyping of cancer?

Answer

Study These Flashcards

A

Immunohistochemistry (IHC), Next-generation sequencing (NGS), Gene expression profiling

Question 20

Q

What is the primary objective of single sample gene set enrichment analysis (ssGSEA)?

Answer

Study These Flashcards

A

To determine the functional enrichment of gene sets in a single sample

Question 21

Q

What are batch effects in gene expression data?

Answer

Study These Flashcards

A

Systematic variations in gene expression attributed to technical factors

Question 22

Q

When comparing gene expression data of cohorts of patients, why is it important to consider biological confounders?

Answer

Study These Flashcards

A

Biological confounders can affect the reliability of outcome, lead to biased interpretation and affect reliability of gene expression measurements

Question 23

Q

How is the distance between two ranked vectors typically measured?

Answer

Study These Flashcards

A

Kendall’s tau distance

Question 24

Q

Why am I teaching you to work with microarray data when RNA seq provides a more comprehensive gene expression profiling?

Answer

Study These Flashcards

A

A lot of cancer subtyping schemes are defined on microarray data

computational precision medicine Flashcards

(24 cards)