Translational Bioinformatics (Sandra Hellberg) Flashcards
RNA-sequencing analysis pipeline
Raw count matrix
Filtering and normalization of RNA-seq data
Filtering: We need to remove genes which have no expression in our data. There will be a lot of genes in the count matrix which have 0 genes. Removing genes with low counts decreases the problem with multiple testing. If we have 20,000 genes, you need to run 20,000 statistical tests and then a problem could be that you could get many false positives. Genes that are unexpressed in all samples have no biological meaning, so you remove the low count genes.
Batch effects and biological confounders
Batch correction and covariates
Properties of RNA-seq data
Differential expression analysis
Multiple testing problem
False positives (calculate fraction of false positives
Multiple testing correction (FWER, Bonferroni, Benjamini Hochberg, FDR)
Nominal p-values, adjusted p-values, q-values
High-dimensonal data analysis (PCA, MDS, SVD, tSNE, K-Means, hierarchical clustering)
High dimensional data refers to a dataset in which the number of features is larger than the number of observations. The problem with this type of data is that it is very huge and quite computer heavy. Excel is not compatible with this. A lot of this sequencing data is stuff that you cannot understand.
Biological pathways
Gene enrichment analysis (pathway and gene set)
Disease enrichment