Bioinformatics Lecture 5 Flashcards
difference genome transcriptoms
genome is static
transcriptome changes
Sagner sequencing
first generation sequencing Nobel prize 1980 with fluorescent colours makes multiple copies produces short reads
NGS
massively parallel sequencing
next generation sequencing
produces long reads
polonies, adapters and primers
NGR steps
- fragmentation
- same primer for all fragments
- creating clusters of the same sequence fragments = colonies (-> PCR)
- massively parallel sequencing by synthesis on array
de novo assembly
when there is no reference genome
using De Bruijn graphs
k-mers
puts out longer linear stretches
contigs
long linear stretches
from de novo assembly
shorter than a chromosome
you don’t know how they fit together
De Bruijn graphs
how the k-mers are connected to each other, given a read
node is drawn it the k-mer fits the next k+1-mer
if it doesn’t, you draw a new node
nodes follow reading direction unidirectionarily
bubbles in De Bruijn graphs
maybe technical error
maybe biological variation
problems with NGS
1% error rate
lack of coverage problem because of randomness
repetitive sequences, don’t know where they stop
sequencing depth
average number of reads per base
over entire genome
coverage
numbers of reads per base
over specific region
novoBreak pipeline
identifies break points
e. g. in healthy vs tutor DNA
you are interested in reads that are not found in healthy DNA
normals are just aligned to reference
applications of sequencing
RNA seq meta genomics exam seq chip seq whole genome sequencing amplicon seq structural variation seq
chip seq
finding regulatory regions
sample from specific cell parts
structural variation seq
to find rearrangements
e. g. in tumor DNA