RNA Seq Flashcards
method of choice to study gene expression and identify novel RNA species
RNA seq
Since RNA seq is done using instruments that sequence DNA molecules what step must take place
cDNA library prep from RNA
Most common application of RNAseq
sequencing of polyadenylated RNA
oligo-dT priming based methods can exhibit
3’ bias
results in sequencing reads enriched for the 3’ portion
Enzyme that converts RNA into DNA
Reverse transcriptase
Most abundant type of RNA
ribosomal
why do we not want ribosomal RNA
usually of little interest to the study as they make up ribosomes that make proteins, not genetic material
Methods used to remove rRNA
poly A selection
rRNA depletion
preferred method to remove rRNA
poly A selection
why would you use the method of rRNA depletion
if interested in noncoding RNAs
Which of the two methods to delete rRNA are more expensive
rRNA
What is the next step after removing rRNA
fragmentation to a certain sample size range
why does lack of strand specificity make it difficult to identify the antisense and novel RNA species
less unique sequence makes it harder to piece together
Creating a library for RNA
incorporate a dUTP into the second strand of cDNA
almost all multi-exon genes display
alternative splicing which plays a role in regulation of cellular processes
Gene fusion
can place two noncontinuous genomic regions together in a single transcript
solution to unravel the complexity of alternatively splicing and gene fusion isoforms is to
sequence each transcript from beginning to end
what is important for gene regulation
what
when
where
how much
types of changes in reading frame
silent-replace AA but similar enough not to matter
synonymous- changes sequence but not AA
nonsynonymous-changes AA
The power of RNA seq lies in the fact that the twin aspects of
discovery and quantification can be combined in a single throughput sequencing array
Crucial prerequisite for a successful RNA seq study is that
the data generated have the ability to answer the biological questions of interest
experimental design
SE
want to know how much
PE
want to know which
how many protein coding genes in mammalian genome
20,000
total number of distinct molecules and how much we sequence depends on
library complexity
two types of replicates
technical-accuracy of technique
biological-what we are interested in, true variation
Why are three replicates the minimum
3 is just enough to compare data for statistical reasons
always better to have more
analyzing methods
genome mapping
transcriptome mapping
reference free assembly
why would too many short sequences cause a problem
short reads give quantity not quality
transcript expression is
how much
what level the gene is expressed
Normalizing to the size of the gene
important step for normalizing the data
in the absence of biological replication
no population inference can be made
results of RNA seq are affected by
parameter