Midterm 1 - Notes 5 (Part 3) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What type of sequencing has a higher error rate than SMRT?

A

Nanopore sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do nanopore sequencing occur very fast?

A

Occurs very fast because there is no sample DNA or enzyme involved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the sample prep for nanopore sequencing? (3)

A
  1. Fragment DNA
    - continuous reads
  2. Add leader to one side
    - directs to motor protein at pore
  3. Add hairpin adaptor to other side of structure
    - goes through the pore and sinch the hairpin attaches to the complementary strand then that goes through the pore next
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does nanoproe sequencing allow?

A

Sequencing the same fragment twice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does nanopore sequencing have a high error rate?

A

Yes

- 100x higher than illumina

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are 4 benefits to nanopore sequencing?

A
  1. Very long reads
    - longer than SMRT
  2. Very high throughput
  3. Very fast
  4. Small instrument footprint
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Contig

A

Refers to overlapping sequence data (reads); in top down sequencing projects
- refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Contig assembly classical alignment programs (4)

A
  1. Start with 2 sequences
  2. Start off with a sliding window where the strands are identical
  3. The you move out and look for differences between the 2 sequences
  4. Need to allow a few mutations because it could be the same after the single mutation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do they do in contif assembly instead of taking a whole sequence and looking at the individuals? (2)

A
  1. We break down a single read into even smaller species (k-mer)
  2. Then you start looking for identical overlap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference for contig assembly?

A

You are only looking for identical sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean when if k-mers from different reads follow the same path?

A

They are overlapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 2 common paths that may diverge (polymorphism)?

A
  1. Single nucleotide differences cause “bubbles” of length k in the graph
  2. Introns or deletions introduce shorter path in the graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are 4 advantages for contig assembly?

A
  1. No reference genome needed
    - relay on the data you already have
  2. Identifies novel reactions not presented in reference
  3. Can identify genomic DNA/ transcripts from exogenous sources (not from our organisms)
    - bacteria, viruses, etc.
    - have to be careful with contamination
  4. For RNA sequences, long introns are not a problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 2 disadvantages for contig assembly?

A
  1. Computationally intrusive
    - needs up to a terabyte of memory
  2. Creates smaller contigs: many gaps in assembly
    - need much higher sequence depth
    - for genome sequence: needs complementation with longer reads
    - for RNA sequence: many split transcripts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Promoter

A

A site on DNA to which the enzyme RNA polymerase can bind to initiate the transcription of DNA into RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Transcriptional start site

A

Is the location where transcription starts at the 5’ end of a gene sequence

17
Q

Exons

A

A segment of a DNA or RNA molecule containing information coding for a protein of peptide sequence

18
Q

Intron

A

A segment of a DNA or RNA molecule that does not code for a protein or peptide sequence and interrupts the sequence of the genes

19
Q

Open reading frame

A

Is the part of the reading frame that has the ability to be translated
- is a continuous stretch of codons that contain a start codon and a stop codon

20
Q

What is the sequence for a start codon?

A

AUG

21
Q

What are 3 possible sequences for a stop codon?

A
  1. UAA
  2. UAG
  3. UGA
22
Q

Untranslated region

A

Is the region of an mRNA that is directly upstream from the initiation codon
- this region is important for the regulation of a transcript by differing mechanisms

23
Q

Poly-adenylation signals

A

Is the addition of a poly(A) tail to a mRNA

- getting ready for translation

24
Q

Transcription terminal signal

A

Is a section of nucleic acid sequence that marks the end of a gene or operon in genomic DNA during transcription

25
Q

What 2 things to all of these gene structures have?

A
  1. Short life span

2. Not highly conserved

26
Q

HMM

A

Hidden Markov Model

27
Q

Hidden Markov Model

A

Searches for likelihood of gene features in the right order

28
Q

What does it mean if you have all the gene features in order?

A

You have a very likely chance that you have a proper structure

29
Q

What is the first annotation of the genome?

A

Generating gene models

30
Q

Gene annotation – similarity based approaches

A

Use known DNA/RNA sequence that have been sequenced before

- already have prior knowledge

31
Q

What can you do when you already have existing transcript of protein sequence information?

A
  1. Map available transcript sequences from the same species onto the genome
  2. Search genome against all known protein coding genes
  3. Map annoted genes from closely related species onto the genome
    - get a high confidence that it is actually a gene
32
Q

What is a disadvantage to similarity based approaches?

A

You can only find what is already known

- can find location and what is interrupted by it

33
Q

When is similarity based approaches most frequently used?

A

In a combined approach

- gene predictors that that mapped transcripts into account

34
Q

What kind of information is linked to the genome sequence? (6)

A
  1. Gene models
  2. Transcripts
  3. Similarity to other genomes
  4. Regulatory elements
  5. Genetic markers
  6. Mutations
35
Q

What additional information is linked to a gene model?

A
  1. Sequences
  2. Functional annotations
  3. Names
  4. Functions
  5. Protein properties
  6. Systematic categorization
  7. Mutant phenotypes and lines available
  8. References
  9. Expression patterns