The Woogie 1 Flashcards

1
Q

Cons of sanger sequencing?

A
  1. expensive
  2. error prone
  3. prone to bias
  4. Low throughput (low amounts of reads can be sequenced at one time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Alignment techniques using reference genome

A
  1. Align reads against reference genome
    2.Mapping: recover position of sequence reads in genome
  2. Alignment: recover position of sequence read in genome read
  3. Alignment based on sequence matches to locate gene read
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Genome Project Challenges

A

1.Sequence technologies not perfect
2. DNA harder to sequence between samples
3. Reference genome could have multiple organisms which means bigger reads.
4. Sample sequence error/mismatches/gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2nd Generation Sequencing methods

A
  1. Library prep: Cells -DNA
  2. DNA fragments attach to adapters
  3. Anchoring to sequencer
    - DNA melted added to flow cell
    - Floor cell saturated with oligonucleotides, complimentary to the adapters.
  4. Adapters sequence attach fragments to the bottom of the machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First generation: Sanger sequencing- sequencing by synthesis

A
  1. Fluorescent bases added by polymerase
  2. Laser Identifies base type
  3. Fluorescent base then removed
    4.Terminal nucleotide can accept another base
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Genome Size Abundance

A

1.Complex Organisms generally have larger genomes. Less Complex smaller.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is in a genome?

A

1.Genes
2.Centromeres and telomeres
3. Bind site, repeat region transposable elements
4. Epigenetics marks:- methyl markers in histones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sanger sequencing using dNTP and ddNTP

A
  1. Unwind DNA helix to single strands
  2. Polymerase makes new strand
  3. ddNTP halts polymerase action
  4. Sequences run on gel (electrophoresis)
  5. Laser shines onto ddNtP nucleotides
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pro’s of 2nd Gen sequencing

A

1.Fast
2.Cheap
3.High throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3rd Generation Sequencing Pros

A
  1. Fast
  2. Cheap
  3. High throughput
  4. Can read single strands
  5. Can interpret long reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is throughput sequencing?

A

The the computers ability to test a specific number of sequences at one time?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cons of 2nd generaton sequencing

A

1.Prone to error (repeat regions)
2.Prone to amplification biases
3.Fragment length restricted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A Genome consists of?

A

1.Chromosomes
2. mtDNA
3. Chloroplast DNA
4. Plasmids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3rd Generation Sequencing Cons

A
  1. Some sequencing can be expensive
  2. Nanopore cheap, but flow cell expensive
  3. Prone to error (only 80% accuracy)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

N50 contig read

A

Point at which genome is covered by contigs of this size
1. ADD up all contigs lengths
2. Sort by decreasing length
3. largest contig
- covers 50% assembly length
- if not take length of second largest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Genome assembly metrics depend on:

A
  1. Number of contigs
  2. Length of assembly
  3. Number of genes
  4. Assembly accuracy
17
Q

String graph assembly

A
  1. uses overlap of reads to build graph.
  2. reads becomes nodes and overlap become edges
  3. need to specify minimum overlap to have meaningful graph to consider as true.
18
Q

De Bruijn Graph

A
  1. Split reads into 2 strings of letters (kmers)
  2. Use Kmer overlap to build graph
  3. De Bruijn uses short strings and nodes to build graph
  4. Repeat regions can make this complicated
19
Q

Presence of necessary of biological genes

A

90% of species have 1 copy of genes

20
Q

Challenges in genome sequencing

A
  1. Short reads around 300bps
  2. Similar sequence multiples found in genome
  3. Repetitive regions increase read length
  4. Multiple gene copies, sequence errors and uneven genomic coverage
21
Q

Assembly vs alignment

A
  1. Assembly: without genome assembles reads into a genome
  2. Alignment: aligns reads against given genome
22
Q

Genome assembly

A
  1. Reads look to be made into one piece (hard to achieve)
  2. Can assembly into contiguous
  3. String between contiguous are scaffolds (NNN)
  4. Overall, assemblies exist as contiguous, scaffolds and chromosomes
23
Q

Gene Annotation

A

1.Assemblies useful when genomic features are identified(gene location/promoters)
2. 1st annotation level (start/stop) codons. 2nd use other organism genes to confirm annotation
3. Use RNA sequence from other species to align against