Lecture 5 - Computational analysis Flashcards
It is now simple to measure the expression levels of thousands of genes
simultaneously
Methods such as RNA-seq allow for measurement of
transcriptome-wide expression levels without a reference genome
RNA-seq is useful for
high-throughput sequencing of RNA
RNA-seq allows for quantification of
gene expression and differential expression analyses
RNA-seq allows for characterization of
alternative splicing
de novo means
from the beginning
de novo transcriptome assembly allows for
quantification and exploration of boutique organisms (no genome sequence necessary)
RNA-seq steps
- Extraction of mRNA
- PCR amplification
- Sequencing (single or paired end)
Poly A selection is a method of
isolating Poly(A+) transcription usually using oligo-dT affinity
Ribodepletion depletes
ribosomal RNAs using sequence specific biotin-labeled probes
Reads
the sequenced portion of cDNA fragments
Coverage
read length, number of reads, or haploid genome length
Single-end
cDNA fragments are sequenced from only one end (1x100)
Paired-end
cDNA fragments are sequenced from both ends (2x100)
Strand-specific
You know whether the read originated from the + or - strand
Counts =
(Xi) the number of reads that align to a particular feature i (gene, isoform, miRNA, etc.)
Library size =
(N) number of reads sequenced
FPKM =
Fragments per kilobase of exon per million mapped reads
CPM =
Counts per million mapped reads
FDR =
False discovery rate (the rate of Type I errors - false positives)
FASTA files are
text files with sequences (amino acids or nucleotides)
FASTQ files are
text files containing header, sequence, and quality information
A SAM file is a
tab-delimited text file that contains sequence alignment information
BAM files are
the binary version (compressed and indexed version) of SAM files (they’re smaller)
Compared to single-end RNA-seq, paired end gives
better alignment
Paired end RNA-seq is essential for
splicing analyses and de novo assemblies
Biological replicates are ______ while technical replicates are ______
necessary; not necessary
Longer reads =
better alignments
Implicit internal standards =
housekeeping genes
Explicit external standards =
spike in RNA
Technical replicates control for
variation in your procedure
Biological replicates control for
variation such as growth or environmental effects
Most gene expression experiments assume
- Most genes don’t change
- Only a few genes have significant changes in expression
RNA and protein expression profiles _______ correlate well
do not always
Sequence alignment is a way of
arranging sequences of DNA, RNA, or protein to identify regions of similarity
Two types of sequence alignment
- local
- global
NGS read alignment allows us to
determine where sequence fragments (reads) came from
Differential expression analysis is
the assessment of differences in read counts of genes between two or more experimental conditions
Gene Ontology (GO) Consortium seeks to
provide consistent descriptions of gene products across databases
The GO is comprised of 3 structured ontologies that describe gene products in terms of associated
- Biological processes
- Cellular components
- Molecular functions
Most commonly used databases for data deposition
Gene Expression Omnibus (GEO)
Short Read Archives (SRA)
dbGaP