Lecture 5 Flashcards

1
Q

Describe ways to improve assemblies

A

Scaffolding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Scaffolding

A

Used to figure out how contigs are connected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 methods of scaffolding

A
  1. Paired-end Illumina sequencing
  2. Long Read Sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Paired-end Illumina Sequencing

A

Sequence again in opposite direction
If pairs aren’t perfectly complementary → sequencing errors are present and can be removed from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Long Read Sequencing

A

-3rd gen that helps w/ scaffolding
-Can help fill in gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is unique about PacBio Single Molecule Real Time Sequencing (SMRT) –> Long Read Sequencing

A

-Single Molecule → can see 1 single molecule at a time, no amplification is needed (no clusters)
-50,000 bp
-No pausing or reversible terminators to slow down polymerase to take a picture, it does it in real time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is unique about Oxford Nanopore –> Long Read Sequencing

A

-Single Molecule
-Long reads → 100,000 bp
-Uses very small pore to block ions and identifies base as it passes through the pore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What % of human genome are genes and everything else?

A

-Genes: 1.5%
-Everything else: 98.5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 strategies for identifying genes?

A
  1. Inspection (Bioinformatic)
  2. Homology (Bioinformatic)
  3. Experiment (Wet lab)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the homology strategy

A

-Compare with other genomes
-Search databasesin BLAST to see if that sequence is a confirmed protein coding gene in other organisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is % identity in the homology strategy

A

the % of positions that have the same base or amino acid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the experiment strategy

A

-RNA-seq: extract RNA, convert to DNA, and shotgun sequence it and see which fragments match to transcribed gene
-Genome wide (whole genome) → not all genes are transcribed all the time
Confirms if something is a gene, but not if something is not a gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the ORF (open reading frame)

A

Sequence of codons without stop codons that can encode protein (Sequence in between start and stop codon)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the goals of comparative genomics

A

-Understand relationships between species
-Helps us understand historical questions and how we can predict evolution or how organisms will change in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we need to mask introns when trying to find ORF in the inspection strategy

A

Can be problematic since they may have stop codons that do not affect the protein but are never translated
We hide the stop codons in introns when trying to find genes before translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What features do we look for that are associated with transcription that are upstream of the gene → consistent patterns

A

-Promoters
-CpG Islands → CG pairings, a lot upstream of genes, not common everywhere else

17
Q

How do we find an ORF?

A
  1. Mask introns
  2. Find ORF by length, get rid of all the small ones
  3. Look for features associated w/ transcription
18
Q

What 2 ways can we mask introns?

A
  1. Codon bias
  2. Consensus plot