Genome analysis - RNAseq, comparative, functional Flashcards

1
Q

What are the uses of RNA sequencing?

A
  • Differential gene expression - Quantitative evaluation and comparison of transcript levels.
  • Transcriptome assembly - Building the profile of transcribed regions of the genome, a qualitative evaluation.
  • Metatranscriptommics or community transcriptome analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do we need paired reads when sequencing RNA?

A
  • When interested in alternative splicing.
  • Gives better accuracy in mapping reads.

Generally good for all applications but especially for poorly annotated transcriptomes or lowly expressed genes.

Single reads is cheaper and is enough for differential expressions and when we have well annotated genomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two categories of mapping tools?

A
  • Splice-aware aligners
    These tools are aware of exon-exon junctions and should be used when aligning reads to a reference GENOME.
  • Unspliced aligners
    Should be used when mapping to a reference TRANSCRIPTOME or when applied to organisms without introns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the workflow for expression analysis using alignment?

A
  1. Raw reads quality control
  2. Alignment to reference
  3. Transcript assembly and expression data (count data, how many RNA reads mapped to a gene?)
  4. Differential expression analysis of the count data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is gene length normalization necessary?

A

When we want to compare the expression of two different genes WITHIN a sample. We then need to normalize because genes have different lengths and that will give variations in number of reads that is not due to differential expression.

NOT needed when comparing two different samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we make any inference on biology from differential expression analysis?

A
  • Identify co-regulated genes: If the genes you found to be differentially expressed are co-regulated it could mean that they work together in some biological process.
  • Look if you have any overrepresented GO terms among your differentially expressed genes. If for example GO terms for metabolic functions are overrepresented then maybe the metabolism is affected by the conditions you’re studying.
  • Are the DE genes overrepresented in any KEGG pathways?

These insight may help you understand what functions are affected by the conditions you are studying based on the differential expression results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why are we interested in variant calling for gene markers?

A

To name a few reasons:
- Medicine
- Gene therapy
- GMO
- Breeding
- Forensics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the goal of functional genomics?

A

Functional genomics is the study of how genes and intergenic regions of the genome contribute to different biological processes. The goal of functional genomics is to determine how the individual components of a biological system work together to produce a particular phenotype.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is comparative genomics?

A

Compare two or more genomes to discover the similarities and differences between them as well as their evolutionary relationships.

Can be donate any taxonomic level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is comparative genomics being used?

A

Phylogeny:
By comparing sequences of genes or whole genomes across different genomes we can infer the evolutionary history.

Molecular evolution:
Understanding why and identifying regions in sequences that evolve with different rates.

Genome dynamics:
How does the structure and gene content of a genome change and general trends.

Conserved elements:
Identification of coding and regulatory regions

Epidemiology:
Finding the source of an infection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a good approach for finding out the SNPs associated with some trait to use as biological markers?

A

Sequencing deep with high quality reads is necessary to find the small SNPs.

Depending on wether there is a good quality reference or not we can use Illumina or PacBio HiFi and this also decides if we need to do a de novo assembly or if we can simply map back the reads to a reference.

GWAS is then appropriate for associationg the SNPs found to certain traits.

The drawback is that GWAS requires a large sample size to get enough statistical power and that also introduces the problems of multiple testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between the core-genome and the pan-genome?

A

Core-genome: includes only genes found in all individuals of a given species. The “necessary genes”.

Pan-genome: includes all genes found in all individuals of a given species. Even the “non necessary ones”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we do comparative genomics at large phylogenetic scale?

A

To find similarities or differences in genome structure and genome size variation we can look at synteny and collinearity between the genomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we use comparative genomics to identify the sexes of patients?

A

If we have a reference of known sex we can map back the reads of the patients to the reference and see if we have heterozygote or homozygote sex chromosomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some epigenetic variations we can see in the genome? How can we view these differences between for example case/control studies, different tissues, developmental stages, paternal/maternal alleles?

A

Epigenetic variations can be in the form of variations in:

  • DNA methylation: Changes whether transcription on the gene is active or not. This we can see variations of if we do WGBS for both case and control samples. The DNA is treated with bisulfate which converts unmethylated cytosines to uracil and this allows us to see the differences between two samples.
  • Histone modification: depending on how the DNA is packed around the histones transcription will have a harder or easier time in accessing the DNA. Variations in how the DNA is interacting with the DNA can be viewed with Chip-Seq.
  • Chromatin state refers to the configuration of chromatin that influences gene expression, characterized by nucleosome positioning and histone modifications. DNAse-seq is used to identify regions of open chromatin that are accessible to regulatory proteins. These regions, known as DNase I hypersensitive sites (DHSs), are indicative of active regulatory elements such as promoters, enhancers, insulators.
  • Chromatin loops are three-dimensional interactions between distant genomic regions, crucial for regulating gene expression by facilitating enhancer-promoter interactions. Hi-C can be used to study chromatin interactions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are TEs? How are they distributed in the eukaryotic genome?

A

Transposable elements are DNA sequences that have the ability to move around in the genome and are very common in eukaryotic genomes.

Their only mission is to replicate within the genome and their distribution in the genome is not random because of this. They avoid loci where negative selection will purge them and they favor loci where their propagation is enhanced

17
Q

What are the two main classes of transposable elements?

A

Retrotransposons - move around with a copy paste method. RNA is copied to cDNA.

DNA ransposons - move around by moving in front of the replication fork.

18
Q

Explain the conflict between TEs and somatic genes, how does this conflict lead to more complex genomes?

A

The conflict between transposable elements (TEs) and somatic genes arises from their differing evolutionary interests. The mobility of TEs can disrupt normal gene function, cause mutations, and lead to genomic instability. Consequently, somatic genes and the host genome have evolved mechanisms to suppress and control the activity of TEs to maintain genomic integrity.

The ongoing conflict between TEs and host genomes has driven the evolution of sophisticated defense mechanisms, including DNA methylation and KRAB zinc-finger genes that help suppress TEs but also play crucial roles in regulating gene expression and maintaining genome stability. The evolution of these defense mechanisms has added layers of regulatory complexity to the genome, contributing to the organism’s overall biological complexity.

19
Q

What is epigenetics?

A

The study of heritable phenotype changes that do not involve alterations in the DNA sequence by modulation the RNA expression.

20
Q

How can ChIP-seq be used to find the origin of recombination hotspots?

A

Good method for findin protein-DNA interactions and finding out the exact binding sites of the proteins to the DNA.

Identify genes that initiate or drive recombination like PRDM9 or H3K4me3.

Treat the cells with formaldehyde to fix the proteins and crosslink the proteins to the DNA they are binding and use antibodies to target the proteins.

Then sequence the purified DNA to analyze the patterns of where the proteins interact with the DNA to induce recombination.

21
Q

Explain the differences between GWAS and Tn-seq, what are they used for and how do they differ from each other?

A

Both GWAS and Tn-seq is a way of associating specific genotypes to specific traits/phenotypes but they do so in fundamentally different ways.

GWAS is a population-based approach using natural genetic variation (like SNPs) and statistics to associate the genetic variation to phenotypic variation. We directly sample from the environment and get indirect evidence of selection.

Tn-seq is a mutant induced method that uses fitness-analysis. Meaning that we model the environments used for selection to get direct evidence of selection

22
Q

Explain Tn-seq, how is it done and when is it useful?

A

Very good way of identifying genes that are important for survival or that play a crucial role for certain conditions. For example if you want to know specific genes involved in virulence in different hosts.

We randomly insert transposons into the genome, creating mutants. The insertion of the transposon will disrupt the function of the genome where it lands. We will then introduce selection pressure that relates to the condition of interest. For example, if we are interested in antibiotic resistance then we may add some kind of antibiotic. Could also be drugs, nutrient limitations ect.

After the microbes have grown under the selective pressure we extract DNA and sequence the regions of the DNA that has the TEs and then map it back to a reference to find the locations of the transposons. We then compare the groups with inserted transposons to a control group to see the differences in abundance of inserted TEs. For example, the genes that are important for the condition will have negative selection and we will see fewer insertions than in the control group.

Typically used to microbes with smaller less complex genomes. For larger genomes that are more complex it is hard to get a good density of transposons in the coding parts.

23
Q

How does your confidence change when using orthologues for functional annotations between different evolutionary distances?

A

If the evolutionary distance is very long then the risk of the genes having changed function is greater and I am therefore less sure of them.

Closer evolutionary distance is better.

24
Q

How can we use symmetry to compare two genomes?

A

We can do whole genome alignments and then look at the pattern in a graph with one genome on each axis.

If there are no structural variations like inversions in one of them, the graph will just show a straight line but if there are variations we will see X shapes.

25
Q

Is gene order well conserved between species?

A

Over longer distances no, but over shorter distances it is well conserved in prokaryotes.

26
Q

What could be the reason for having larger and more variable pan-genome?

A

Differences in horizontal gene transfer.

Bacteria that live in close proximity to other bacteria are very exposed to horizontal gene transfer where different species transfer genes between lineages.

Mutations are also causes of large and variable pan-genomes.

27
Q

If we have a new phylum of bacteria and we want to find the unique genes for this phylum, how would we do it?

A

Comparative genomics.

Sequence multiple strains of the new phylum and do structural and functional annotations.

Compare to other closely related phylums.

28
Q

What are othnologues?

A

Ohnologs (or ohnologues) are a specific type of paralog that originate from whole-genome duplication (WGD) events.

29
Q

What are Xenologues

A

Xenologues are genes that are related by horizontal gene transfer (HGT) rather than by vertical inheritance from a common ancestor.

30
Q

What are paralogues?

A

Paralogues (or paralogs) are genes within the same organism that arise by duplication of a single original gene.

31
Q

What are gametologues?

A

Genes conserved between sex chromosomes.

32
Q

What should you think of when you want to variant call SNPs?

A
  • sequence with high accuracy technologies to be able to find the small variations.
  • a lot of your found SNPs will in fact be artifacts so you should filter them.
33
Q

What are the different ways of finding SNPs?

A

We can sequence with short reads and map to a reference.

Use DNA array where known SNPs are all on a chip. We then flood our samples to see our sample DNA binds to any of the SNPs on the chip. This is fast since it does not require any reads or mapping but it has assortment bias, meaning we can only find what we already know.

34
Q

When doing variant calling with mapping, we get bam files that has different headers, what are they and what do they tell us?

A

Flag and cigar headers can be used to filter out low quality reads but can also be useful in finding structural variations such as deletions, duplications, inversions.

35
Q

What are the two different methods for filtering SNPs?

A

VSQR and hard filtering.

VSQR is machine learning based and hard filtering uses thresholds of each measure.

VSQR usually has better performance but you need large labeled datasets for training - you need a good panel.

36
Q

What are the three different methods for calling structural variants?

A

sequencing with short reads, toward-graph based approach and CNVs from SNP data.

The graph based in highly reliable and requires no reference but it is very expensive.

The short reads approach is affordable for most but also relies on the fact that there is a good reference.

37
Q

How could we determine the loci of the sex chromosomes that determine the sex?

A

To find the parts of the sex chromosomes that actually give the sex phenotypes we can try to look for the female-to-male SNPs. These are small variations in the sex chromosomes between male and females.

I would sequence multiple samples from both sexes and map them back to a male reference (heterogamete for the sex chromosomes). It is important to use a male so that the reads from the male chromosome will map back to the reference.

Then I would proceed to do variant calling to get the SNPs and then I would look for patterns in the SNPs between the sexes.