Midterm 1 - Notes 5 (Part 3) Flashcards by Lindsay Moulaison

What type of sequencing has a higher error rate than SMRT?

Nanopore sequencing

How well did you know this?

Not at all

Perfectly

Why do nanopore sequencing occur very fast?

Occurs very fast because there is no sample DNA or enzyme involved

How well did you know this?

Not at all

Perfectly

What is the sample prep for nanopore sequencing? (3)

Fragment DNA
- continuous reads
Add leader to one side
- directs to motor protein at pore
Add hairpin adaptor to other side of structure
- goes through the pore and sinch the hairpin attaches to the complementary strand then that goes through the pore next

How well did you know this?

Not at all

Perfectly

What does nanoproe sequencing allow?

Sequencing the same fragment twice

How well did you know this?

Not at all

Perfectly

Does nanopore sequencing have a high error rate?

Yes

- 100x higher than illumina

How well did you know this?

Not at all

Perfectly

What are 4 benefits to nanopore sequencing?

Very long reads
- longer than SMRT
Very high throughput
Very fast
Small instrument footprint

How well did you know this?

Not at all

Perfectly

Contig

Refers to overlapping sequence data (reads); in top down sequencing projects
- refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly

How well did you know this?

Not at all

Perfectly

Contig assembly classical alignment programs (4)

Start with 2 sequences
Start off with a sliding window where the strands are identical
The you move out and look for differences between the 2 sequences
Need to allow a few mutations because it could be the same after the single mutation

How well did you know this?

Not at all

Perfectly

What do they do in contif assembly instead of taking a whole sequence and looking at the individuals? (2)

We break down a single read into even smaller species (k-mer)
Then you start looking for identical overlap

How well did you know this?

Not at all

Perfectly

What is the difference for contig assembly?

You are only looking for identical sequences

How well did you know this?

Not at all

Perfectly

What does it mean when if k-mers from different reads follow the same path?

They are overlapping

How well did you know this?

Not at all

Perfectly

What are 2 common paths that may diverge (polymorphism)?

Single nucleotide differences cause “bubbles” of length k in the graph
Introns or deletions introduce shorter path in the graph

How well did you know this?

Not at all

Perfectly

What are 4 advantages for contig assembly?

No reference genome needed
- relay on the data you already have
Identifies novel reactions not presented in reference
Can identify genomic DNA/ transcripts from exogenous sources (not from our organisms)
- bacteria, viruses, etc.
- have to be careful with contamination
For RNA sequences, long introns are not a problem

How well did you know this?

Not at all

Perfectly

What are 2 disadvantages for contig assembly?

Computationally intrusive
- needs up to a terabyte of memory
Creates smaller contigs: many gaps in assembly
- need much higher sequence depth
- for genome sequence: needs complementation with longer reads
- for RNA sequence: many split transcripts

How well did you know this?

Not at all

Perfectly

Promoter

A site on DNA to which the enzyme RNA polymerase can bind to initiate the transcription of DNA into RNA

How well did you know this?

Not at all

Perfectly

Transcriptional start site

Is the location where transcription starts at the 5’ end of a gene sequence

Exons

A segment of a DNA or RNA molecule containing information coding for a protein of peptide sequence

Intron

A segment of a DNA or RNA molecule that does not code for a protein or peptide sequence and interrupts the sequence of the genes

Open reading frame

Is the part of the reading frame that has the ability to be translated
- is a continuous stretch of codons that contain a start codon and a stop codon

What is the sequence for a start codon?

AUG

What are 3 possible sequences for a stop codon?

Untranslated region

Is the region of an mRNA that is directly upstream from the initiation codon
- this region is important for the regulation of a transcript by differing mechanisms

Poly-adenylation signals

Is the addition of a poly(A) tail to a mRNA

- getting ready for translation

Transcription terminal signal

Is a section of nucleic acid sequence that marks the end of a gene or operon in genomic DNA during transcription

What 2 things to all of these gene structures have?

1. Short life span | 2. Not highly conserved

HMM

Hidden Markov Model

Searches for likelihood of gene features in the right order

What does it mean if you have all the gene features in order?

You have a very likely chance that you have a proper structure

What is the first annotation of the genome?

Generating gene models

Gene annotation -- similarity based approaches

Use known DNA/RNA sequence that have been sequenced before | - already have prior knowledge

What can you do when you already have existing transcript of protein sequence information?

1. Map available transcript sequences from the same species onto the genome 2. Search genome against all known protein coding genes 3. Map annoted genes from closely related species onto the genome - get a high confidence that it is actually a gene

What is a disadvantage to similarity based approaches?

You can only find what is already known | - can find location and what is interrupted by it

When is similarity based approaches most frequently used?

In a combined approach | - gene predictors that that mapped transcripts into account

What kind of information is linked to the genome sequence? (6)

1. Gene models 2. Transcripts 3. Similarity to other genomes 4. Regulatory elements 5. Genetic markers 6. Mutations

What additional information is linked to a gene model?

1. Sequences 2. Functional annotations 3. Names 4. Functions 5. Protein properties 6. Systematic categorization 7. Mutant phenotypes and lines available 8. References 9. Expression patterns