computational precision medicine Flashcards
What can PCA be used for?
Visualization of multidimensional sample differences, Dimensionality reduction, Visualization of batch effects
What can the units of the axes in a PCA plot be?
Raw component scores
What are the units of gene quantification in RNA sequencing?
Transcripts per million, Fragments per kilobase per million, Number of reads mapped to a given transcript
What is the distribution of RNA sequencing counts?
Negative binomial distribution
What is the purpose of computational precision medicine?
Finding and evaluating new therapeutic targets, Supporting the practices of precision medicine with data analysis, Supporting of precision diagnostics by large-scale data analysis
What is the difference between Pearson’s and Spearman’s correlation coefficients?
Pearson’s correlation evaluates the linear relationship between two continuous variables, whereas Spearman’s correlation evaluates the monotonic relationship
What are batch effects?
Technical variance that we want to remove, Batch effects can introduce unwanted systematic variation in the data, which is unrelated to the biological factors of interest.
How many dimensions are produced using PCA?
same as the number of features in the orignal space
What is the purpose of the UCSC Xena project?
Enabling visual data analysis, A standardized pre-processing of mulitple omics data sets, A repository of gene expression data
What is NOT a hallmark of cancer
Resistance to immunotherapy
Why do you log2 transform data and what is meant by “count+1”?
when the numbers are too big, we transform it to see everything, and count+1 is just because we can’t log2 to zero
What is RPKM?
reads per kilobase per million reads mapped, normalizes for gene length and sequencing depth, if high RPKM then we have high expression
What is FPKM?
Fragments per kilo base of transcript per million mapped fragments, analouges to RPKM and used for paired end data, if high FPKM we have high expression
What is TPM?
Transcripts per million fragments, also normalizes for gene lenght and sequencing depth and is better suited to compare expression between samples
What can you not do for either RPKM, FPKM and TPM?
You cannot use for differential gene expression (DESEeq2/edgeR)