Lecture 5 Flashcards
Describe ways to improve assemblies
Scaffolding
What is Scaffolding
Used to figure out how contigs are connected
What are the 2 methods of scaffolding
- Paired-end Illumina sequencing
- Long Read Sequencing
What is Paired-end Illumina Sequencing
Sequence again in opposite direction
If pairs aren’t perfectly complementary → sequencing errors are present and can be removed from data
What is Long Read Sequencing
-3rd gen that helps w/ scaffolding
-Can help fill in gaps
What is unique about PacBio Single Molecule Real Time Sequencing (SMRT) –> Long Read Sequencing
-Single Molecule → can see 1 single molecule at a time, no amplification is needed (no clusters)
-50,000 bp
-No pausing or reversible terminators to slow down polymerase to take a picture, it does it in real time
What is unique about Oxford Nanopore –> Long Read Sequencing
-Single Molecule
-Long reads → 100,000 bp
-Uses very small pore to block ions and identifies base as it passes through the pore
What % of human genome are genes and everything else?
-Genes: 1.5%
-Everything else: 98.5%
What are the 3 strategies for identifying genes?
- Inspection (Bioinformatic)
- Homology (Bioinformatic)
- Experiment (Wet lab)
What is the homology strategy
-Compare with other genomes
-Search databasesin BLAST to see if that sequence is a confirmed protein coding gene in other organisms
What is % identity in the homology strategy
the % of positions that have the same base or amino acid
What is the experiment strategy
-RNA-seq: extract RNA, convert to DNA, and shotgun sequence it and see which fragments match to transcribed gene
-Genome wide (whole genome) → not all genes are transcribed all the time
Confirms if something is a gene, but not if something is not a gene
What is the ORF (open reading frame)
Sequence of codons without stop codons that can encode protein (Sequence in between start and stop codon)
Explain the goals of comparative genomics
-Understand relationships between species
-Helps us understand historical questions and how we can predict evolution or how organisms will change in the future
Why do we need to mask introns when trying to find ORF in the inspection strategy
Can be problematic since they may have stop codons that do not affect the protein but are never translated
We hide the stop codons in introns when trying to find genes before translation