Midterm 1 - Notes 5 (Part 3) Flashcards
What type of sequencing has a higher error rate than SMRT?
Nanopore sequencing
Why do nanopore sequencing occur very fast?
Occurs very fast because there is no sample DNA or enzyme involved
What is the sample prep for nanopore sequencing? (3)
- Fragment DNA
- continuous reads - Add leader to one side
- directs to motor protein at pore - Add hairpin adaptor to other side of structure
- goes through the pore and sinch the hairpin attaches to the complementary strand then that goes through the pore next
What does nanoproe sequencing allow?
Sequencing the same fragment twice
Does nanopore sequencing have a high error rate?
Yes
- 100x higher than illumina
What are 4 benefits to nanopore sequencing?
- Very long reads
- longer than SMRT - Very high throughput
- Very fast
- Small instrument footprint
Contig
Refers to overlapping sequence data (reads); in top down sequencing projects
- refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly
Contig assembly classical alignment programs (4)
- Start with 2 sequences
- Start off with a sliding window where the strands are identical
- The you move out and look for differences between the 2 sequences
- Need to allow a few mutations because it could be the same after the single mutation
What do they do in contif assembly instead of taking a whole sequence and looking at the individuals? (2)
- We break down a single read into even smaller species (k-mer)
- Then you start looking for identical overlap
What is the difference for contig assembly?
You are only looking for identical sequences
What does it mean when if k-mers from different reads follow the same path?
They are overlapping
What are 2 common paths that may diverge (polymorphism)?
- Single nucleotide differences cause “bubbles” of length k in the graph
- Introns or deletions introduce shorter path in the graph
What are 4 advantages for contig assembly?
- No reference genome needed
- relay on the data you already have - Identifies novel reactions not presented in reference
- Can identify genomic DNA/ transcripts from exogenous sources (not from our organisms)
- bacteria, viruses, etc.
- have to be careful with contamination - For RNA sequences, long introns are not a problem
What are 2 disadvantages for contig assembly?
- Computationally intrusive
- needs up to a terabyte of memory - Creates smaller contigs: many gaps in assembly
- need much higher sequence depth
- for genome sequence: needs complementation with longer reads
- for RNA sequence: many split transcripts
Promoter
A site on DNA to which the enzyme RNA polymerase can bind to initiate the transcription of DNA into RNA
Transcriptional start site
Is the location where transcription starts at the 5’ end of a gene sequence
Exons
A segment of a DNA or RNA molecule containing information coding for a protein of peptide sequence
Intron
A segment of a DNA or RNA molecule that does not code for a protein or peptide sequence and interrupts the sequence of the genes
Open reading frame
Is the part of the reading frame that has the ability to be translated
- is a continuous stretch of codons that contain a start codon and a stop codon
What is the sequence for a start codon?
AUG
What are 3 possible sequences for a stop codon?
- UAA
- UAG
- UGA
Untranslated region
Is the region of an mRNA that is directly upstream from the initiation codon
- this region is important for the regulation of a transcript by differing mechanisms
Poly-adenylation signals
Is the addition of a poly(A) tail to a mRNA
- getting ready for translation
Transcription terminal signal
Is a section of nucleic acid sequence that marks the end of a gene or operon in genomic DNA during transcription