The Woogie 1 Flashcards
Cons of sanger sequencing?
- expensive
- error prone
- prone to bias
- Low throughput (low amounts of reads can be sequenced at one time)
Alignment techniques using reference genome
- Align reads against reference genome
2.Mapping: recover position of sequence reads in genome - Alignment: recover position of sequence read in genome read
- Alignment based on sequence matches to locate gene read
Genome Project Challenges
1.Sequence technologies not perfect
2. DNA harder to sequence between samples
3. Reference genome could have multiple organisms which means bigger reads.
4. Sample sequence error/mismatches/gaps.
2nd Generation Sequencing methods
- Library prep: Cells -DNA
- DNA fragments attach to adapters
- Anchoring to sequencer
- DNA melted added to flow cell
- Floor cell saturated with oligonucleotides, complimentary to the adapters. - Adapters sequence attach fragments to the bottom of the machine
First generation: Sanger sequencing- sequencing by synthesis
- Fluorescent bases added by polymerase
- Laser Identifies base type
- Fluorescent base then removed
4.Terminal nucleotide can accept another base
Genome Size Abundance
1.Complex Organisms generally have larger genomes. Less Complex smaller.
What is in a genome?
1.Genes
2.Centromeres and telomeres
3. Bind site, repeat region transposable elements
4. Epigenetics marks:- methyl markers in histones
Sanger sequencing using dNTP and ddNTP
- Unwind DNA helix to single strands
- Polymerase makes new strand
- ddNTP halts polymerase action
- Sequences run on gel (electrophoresis)
- Laser shines onto ddNtP nucleotides
Pro’s of 2nd Gen sequencing
1.Fast
2.Cheap
3.High throughput
3rd Generation Sequencing Pros
- Fast
- Cheap
- High throughput
- Can read single strands
- Can interpret long reads
What is throughput sequencing?
The the computers ability to test a specific number of sequences at one time?
Cons of 2nd generaton sequencing
1.Prone to error (repeat regions)
2.Prone to amplification biases
3.Fragment length restricted
A Genome consists of?
1.Chromosomes
2. mtDNA
3. Chloroplast DNA
4. Plasmids
3rd Generation Sequencing Cons
- Some sequencing can be expensive
- Nanopore cheap, but flow cell expensive
- Prone to error (only 80% accuracy)
N50 contig read
Point at which genome is covered by contigs of this size
1. ADD up all contigs lengths
2. Sort by decreasing length
3. largest contig
- covers 50% assembly length
- if not take length of second largest