Genome analysis - RNAseq, comparative, functional Flashcards
What are the uses of RNA sequencing?
- Differential gene expression - Quantitative evaluation and comparison of transcript levels.
- Transcriptome assembly - Building the profile of transcribed regions of the genome, a qualitative evaluation.
- Metatranscriptommics or community transcriptome analysis.
When do we need paired reads when sequencing RNA?
- When interested in alternative splicing.
- Gives better accuracy in mapping reads.
Generally good for all applications but especially for poorly annotated transcriptomes or lowly expressed genes.
Single reads is cheaper and is enough for differential expressions and when we have well annotated genomes.
What are the two categories of mapping tools?
- Splice-aware aligners
These tools are aware of exon-exon junctions and should be used when aligning reads to a reference GENOME. - Unspliced aligners
Should be used when mapping to a reference TRANSCRIPTOME or when applied to organisms without introns.
What is the workflow for expression analysis using alignment?
- Raw reads quality control
- Alignment to reference
- Transcript assembly and expression data (count data, how many RNA reads mapped to a gene?)
- Differential expression analysis of the count data.
When is gene length normalization necessary?
When we want to compare the expression of two different genes WITHIN a sample. We then need to normalize because genes have different lengths and that will give variations in number of reads that is not due to differential expression.
NOT needed when comparing two different samples.
How can we make any inference on biology from differential expression analysis?
- Identify co-regulated genes: If the genes you found to be differentially expressed are co-regulated it could mean that they work together in some biological process.
- Look if you have any overrepresented GO terms among your differentially expressed genes. If for example GO terms for metabolic functions are overrepresented then maybe the metabolism is affected by the conditions you’re studying.
- Are the DE genes overrepresented in any KEGG pathways?
These insight may help you understand what functions are affected by the conditions you are studying based on the differential expression results.
Why are we interested in variant calling for gene markers?
To name a few reasons:
- Medicine
- Gene therapy
- GMO
- Breeding
- Forensics
What is the goal of functional genomics?
Functional genomics is the study of how genes and intergenic regions of the genome contribute to different biological processes. The goal of functional genomics is to determine how the individual components of a biological system work together to produce a particular phenotype.
What is comparative genomics?
Compare two or more genomes to discover the similarities and differences between them as well as their evolutionary relationships.
Can be donate any taxonomic level
How is comparative genomics being used?
Phylogeny:
By comparing sequences of genes or whole genomes across different genomes we can infer the evolutionary history.
Molecular evolution:
Understanding why and identifying regions in sequences that evolve with different rates.
Genome dynamics:
How does the structure and gene content of a genome change and general trends.
Conserved elements:
Identification of coding and regulatory regions
Epidemiology:
Finding the source of an infection
What is a good approach for finding out the SNPs associated with some trait to use as biological markers?
Sequencing deep with high quality reads is necessary to find the small SNPs.
Depending on wether there is a good quality reference or not we can use Illumina or PacBio HiFi and this also decides if we need to do a de novo assembly or if we can simply map back the reads to a reference.
GWAS is then appropriate for associationg the SNPs found to certain traits.
The drawback is that GWAS requires a large sample size to get enough statistical power and that also introduces the problems of multiple testing.
What is the difference between the core-genome and the pan-genome?
Core-genome: includes only genes found in all individuals of a given species. The “necessary genes”.
Pan-genome: includes all genes found in all individuals of a given species. Even the “non necessary ones”
How can we do comparative genomics at large phylogenetic scale?
To find similarities or differences in genome structure and genome size variation we can look at synteny and collinearity between the genomes.
How can we use comparative genomics to identify the sexes of patients?
If we have a reference of known sex we can map back the reads of the patients to the reference and see if we have heterozygote or homozygote sex chromosomes.
What are some epigenetic variations we can see in the genome? How can we view these differences between for example case/control studies, different tissues, developmental stages, paternal/maternal alleles?
Epigenetic variations can be in the form of variations in:
- DNA methylation: Changes whether transcription on the gene is active or not. This we can see variations of if we do WGBS for both case and control samples. The DNA is treated with bisulfate which converts unmethylated cytosines to uracil and this allows us to see the differences between two samples.
- Histone modification: depending on how the DNA is packed around the histones transcription will have a harder or easier time in accessing the DNA. Variations in how the DNA is interacting with the DNA can be viewed with Chip-Seq.
- Chromatin state refers to the configuration of chromatin that influences gene expression, characterized by nucleosome positioning and histone modifications. DNAse-seq is used to identify regions of open chromatin that are accessible to regulatory proteins. These regions, known as DNase I hypersensitive sites (DHSs), are indicative of active regulatory elements such as promoters, enhancers, insulators.
- Chromatin loops are three-dimensional interactions between distant genomic regions, crucial for regulating gene expression by facilitating enhancer-promoter interactions. Hi-C can be used to study chromatin interactions.