HC 4.2 Omics and Gene Expression: Transcript Level Analysis Flashcards

hoorcollege 4

1
Q

Transcription involves which omes?

A

genome and transcriptome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of genes

A

-Protein coding
-Non coding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Principal step of RNA seq is selecting RNA molecules. Which selections are possible?

A

-Size selection
-Type selection with Ribodepletion and poly(A)selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The details of the RNA seq analysis depend on …

A

the experimental context and RNA molecules measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data analysis workflow for gene expression

A

-Selection
-Fragmentation and reverse transcription
-sequence and mapping
-quantitate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Goals mRNAseq

A

Taking complexity and analyse isoforms of genes (transcripts) or working with non-model organism with poorly characterized genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 options for mRNAseq analysis via transcriptome

A
  1. De novo assembly of transcriptome
  2. Well characterized genome: reference-based transcriptome assembly
  3. Combined reference based and de novo assembly
  4. model organism: download transcriptome from ENSEMBL or NCBI and use those for mapping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Principle of assembly

A

Reconstructing long sequences from overlapping sequence fragments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Big challenge with de novo assembly

A

How to find the overlaps with millions of reads generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are De Bruijn graphs made?

A

-Sequence reads to k-mers of length k (nucleotide sequence from the reads)
-Order k-mers based on the overlaps > graph with arrows (de Bruijn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problem with de novo assembly and isoforms

A

Due to multiple k-mers with enough overlap for connecting to the previous one, multiple isoforms of assembled transcriptome are made, and the actual transcriptome is therefore not completely constructed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are long assembled sequences called?

A

Contigs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Purpose De Bruijn graph

A

Method to construct long sequences from short sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the result of an assembly?

A

A contig

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Question with complexity of isoforms in gene expression quantitation

A

Is it a isoform or an assembly (next different piece

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which regions should be searched for in contigs for more biological relevance?

A

ORFs

17
Q

How to identify ORFs (open reading frames) from contigs

A

-Identify ORF by searching start codons (methionine) and stop codons in each frame
-Longest potential ORFs could be candidates for a protein > more biological relevance

18
Q

Gene prediction models

A

-Abinitio: based on gene signals like intron splice site, TF binding site and codon structure
-Homology: significant matches query with known genes
-Probabilitistic: Markov models: translate AA sequence to probable location/function

19
Q

Methods for transcriptome functional annotation

A
  1. searching for homologs based on sequence similarities and identifying assembled sequences
  2. domain and other sequence feature identification (sequence feature annotation)
  3. assigning standardized descriptions for sequence biological properties (GO terms)
20
Q

mRNAseq analysis 2. Reference-based transcriptome assembly

A

-Reads are first splice-aware mapped against reference genome
> connectivity or splice graph is constructed to represent all possible splicing events at a locus
> alternative paths through the graph are followed to join compatible reads together to isoforms
> biological reference

21
Q

mRNAseq analysis 3. Combined reference-based and de novo assembly

A

First: de novo assembly, then alignment
> are the contigs found on the reference genome
> scaffold contigs are mapped
> unassembled reads are mapped and the scaffold contigs which were mapped are extended
Or: First alignment and then assembly
> de novo assembly of unmapped reads
> Reference-based assemby of aligned reads

22
Q

mRNAseq analysis 4. Model organisms download transcriptome using ENSEMBL/NCBi for mapping

A

-Download fasta files online
-Alignment of reads to the transcripts to calculate expression levels

23
Q

After alignment of reads to downloaded transcriptome: the gene expression =

A

The isoform expression

24
Q

Disadvantage download of transcriptome

A

You cannot discover new transcripts
> no characterization, which is already done by the community

25
Q

Why is splice-aware mapping not needed when download based mapping?

A

The transcriptome is downloaded, which does not contain introns
> introns are removed from the isoforms

26
Q

Issues download based mapping

A

-Isoforms are often very similar, so many reads do not align uniquely
> try to calculate the right gene expression levels when there are reads which multi-map (mapping on two transcripts) > the problem should be taken into account
-Numbers of reads depend on transcript lengths
> longer transcripts generate more reads: bias
> Correction is needed

27
Q

Approaches for multi-mapping reads

A

-Ignore the reads: remove them from quantification
-Count once per alignment: count those reads for both alignments
-Split them equally: divide multi-mapping reads and split among the transcripts
-Rescue based on uniquely mapped reads
-and more; the essence; take multi-mapping reads into account

28
Q

Differential gene expression analysis

A

-Differences between control cells and treated cells
-Combine various approaches

29
Q

How can mRNAseq reads be used to characterize and quantify transcriptome?

A

Characterize 4 options:
-De novo assembly
-Reference-based assembly
-Combined reference-based and de novo assembly
-Model organism download
Quantification by mapping reads on the transcripts