Genomes and Genome Sequencing Flashcards
Application of studying genomic
Research
Health (e.g. diagnostic)
Environment (e.g. pollutants)
Agriculture (e.g. livestock, nutrients)
Health example for genomics
Causes of severe intellectual disability in children (42% of cases linked to DNA compared to 12% using other methods)
Disease example for genomics
Inflammatory Bowel Disease (Crohn’s disease)
more viral DNA = more viruses
viruses were bacteriophages
they infected gut bacteria and affected gut bacteria population -> Crohn’s disease
Disease Outbreak Tracking for genomics (only need one)
Ebola - finding point of origin, watching it change over time
HIV - identified known origin, identified species crossovers
Influenza - track current outbreaks of influenza to inform vaccine choices for coming winter in opposite hemisphere/ identify crossover/ crossover potential for strains
The third generation of DNA sequences
Longer DNA sequences
Sanger Sequencing
Chain termination sequencing
Uses DDNTPs (fluorescently labelled nucleotides)
How does Sanger Sequencing work
polymerase rebuilds double helix using normal nucleotides, then randomly adds a fluorescently labelled base, polymerase stops and sequence cut at that point
-> strands of DNA of varying lengths, each ending with a fluorescently-labelled base
(* as many times req. so substitute each base in length)
Then run small pieces on capillary electrophoresis gel
Record fluorescence
Each base is a diff. colour
Downsides of Sanger Sequencing
Slow
Expensive
Not high throughput
Errors in repetitive regions (lots of bases similar to each other, next to each other)
Bias in sequencing (certain regions better amplified than others)
Library Preparation
Extract DNA from cells
Fragment DNA (50-1000bp)
Add adaptors (either end of seq.) one will stick to seq., other will be start point for seq. reaction
Amplification
Issues with Library Preparation
Bias in amplification
How does Illumina Sequencing work?
Fragements added to the flow cell - bind to flow cell (adapter-flow cell)
Polymerases starts at top (furthest from flow cell) and add in fluorescently labelled nucelotides (randomly, on at a time)
+laser excitation, fluorescence recorded
Benefits of Illumina Sequencing
Fast
Cheap
High throughput
Issues with Illumina Sequencing
Repetitive regions
Amplification
Length resistrictions
Third generation sequencing
prevent length resistriction
take out need to amplification
PacBio SMRT
uses Single Molecule, Real-time Technology
Zero-mode wave-guides
One piece of DNA per well
Polymerase in well adds fluorescence like Illumina to single piece of DNA
PacBio Considerations
Higher error rates
No need for amplification
Longer, but not genome-length
Oxford Nanopore Minion
Very small
Membrane with many pores
Feeds single length of DNA through pore, changes in electrical current along membrane indicates base, this is read
Oxford Nanopore MinION
Very small
Membrane with many pores
Feeds single length of DNA through pore, changes in electrical current along membrane indicates base, this is read
Oxford Nanopore MinION Consideration
Does not use fluorescently-labelled nucleotides
Not as accurate as Illumina (99.9%), but close (95%)
Long read (up to 2 million bp)
What is the Prometheon?
48 MinIONS
large amounts of sequencing
Single-Cell Sequencing
uses Illumina
BUT with diff. lib preparation - single-cell
Each cell in a ‘gem’ - when gel broken open all contents labelled with barcode for indv. gem
Can say where DNA comes from -> cell types/spatial transcriptomics
Challenges to genome projects
Sequencing technologies not perfect (e.g. Illumina 99.9% not 100%)
Some DNA harder to seq. than others (e.g. centromere/telomere) - secondary structures
Population representation (variation)
Gaps. errors, lack of variation
Accuracy of assemblage
Genomes keep being corrected (diff. versions from same individual)
Alignments
Reference genome available
Compare and align
Assembly
Does not have an available reference genome
Assemble reads into a reference genome
Is a BEST REPRESENTATION not exact
Steps in an alignment
Find an approrpriate reference genome - diff. versions
Find fragment matches on reference genome
Steps of the alignment analysis
Base calling
Quality control
Alignment/Mapping
Alignment Post-Processing
Base calling
process of determining bases in the sequencing data
Quality control
Phred score
Q value
Mapping vs. Alignment
Mapping = position of the sequence on the reference genome
Alignment = position of the sequence on the reference genome and base-to-base correspondence (whether matches or not)
Alignment
position of the sequence on the reference genome and base-to-base correspondence (whether matches or not)
Options for Alignment Post-Processing
Variant calling
Methylation studies
RNA seq. expression
Structural variants
Mapping vs. Alignment
Mapping = position of the sequence on the reference genome
Alignment = position of the sequence on the reference genome and base-to-base correspondence (whether matches or not)
Mapping
position of the sequence on the reference genome
Ways to align fragment sequence to a reference genome
Brute Force Method - by eye, move along reference a base pair at a time until matches
Alignment Software
What is the “Brute Force” method?
by eye, move along reference a base pair at a time until matches
Considerations with the “Brute Force” method
Easy to do
Very slow
Requires a lot of repetitive computations - inefficient
Alignment Software Types
RNA/DNA/bisulpide sequencing
Alignment Software Algorithms
Burrows Wheeler Transform
Suffix Arrays