computational precision medicine Flashcards
What can PCA be used for?
Visualization of multidimensional sample differences, Dimensionality reduction, Visualization of batch effects
What can the units of the axes in a PCA plot be?
Raw component scores
What are the units of gene quantification in RNA sequencing?
Transcripts per million, Fragments per kilobase per million, Number of reads mapped to a given transcript
What is the distribution of RNA sequencing counts?
Negative binomial distribution
What is the purpose of computational precision medicine?
Finding and evaluating new therapeutic targets, Supporting the practices of precision medicine with data analysis, Supporting of precision diagnostics by large-scale data analysis
What is the difference between Pearson’s and Spearman’s correlation coefficients?
Pearson’s correlation evaluates the linear relationship between two continuous variables, whereas Spearman’s correlation evaluates the monotonic relationship
What are batch effects?
Technical variance that we want to remove, Batch effects can introduce unwanted systematic variation in the data, which is unrelated to the biological factors of interest.
How many dimensions are produced using PCA?
same as the number of features in the orignal space
What is the purpose of the UCSC Xena project?
Enabling visual data analysis, A standardized pre-processing of mulitple omics data sets, A repository of gene expression data
What is NOT a hallmark of cancer
Resistance to immunotherapy
Why do you log2 transform data and what is meant by “count+1”?
when the numbers are too big, we transform it to see everything, and count+1 is just because we can’t log2 to zero
What is RPKM?
reads per kilobase per million reads mapped, normalizes for gene length and sequencing depth, if high RPKM then we have high expression
What is FPKM?
Fragments per kilo base of transcript per million mapped fragments, analouges to RPKM and used for paired end data, if high FPKM we have high expression
What is TPM?
Transcripts per million fragments, also normalizes for gene lenght and sequencing depth and is better suited to compare expression between samples
What can you not do for either RPKM, FPKM and TPM?
You cannot use for differential gene expression (DESEeq2/edgeR)
What is the purpose of molecular subtyping cancer samples?
To classify cancer samples into distinct subgroups based on molecular characteristics
In K-nearest neighbors (KNN) classification, what does the value of K represent?
The number of neighbors considered for classification
In distance to centroid classification, how is the class label of a new data point determined?
By assigning the class label of the centroid closest to the data point
Which methods is commonly used approaches for molecular subtyping of cancer?
Immunohistochemistry (IHC), Next-generation sequencing (NGS), Gene expression profiling
What is the primary objective of single sample gene set enrichment analysis (ssGSEA)?
To determine the functional enrichment of gene sets in a single sample
What are batch effects in gene expression data?
Systematic variations in gene expression attributed to technical factors
When comparing gene expression data of cohorts of patients, why is it important to consider biological confounders?
Biological confounders can affect the reliability of outcome, lead to biased interpretation and affect reliability of gene expression measurements
How is the distance between two ranked vectors typically measured?
Kendall’s tau distance
Why am I teaching you to work with microarray data when RNA seq provides a more comprehensive gene expression profiling?
A lot of cancer subtyping schemes are defined on microarray data