23: Sequencing genome Flashcards
What are the steps of genome sequencing?
- Create genomic DNA library
- Generate many independent sequencing reads
- Align independent reads into contiguous sequences (“contigs”)
- Fill gaps between contigs
- Annotate genome
What is shotgun sequencing?
random sequence fragments sequenced simultaneously then try to assemble the sequence in order
How do you Create genomic DNA library?
- extract DNA from influenza cells
- sonicate through ultrasound shear DNA apart making a lot of small DNA fragments
- separate these fragments on gel electrophoresis
- cut out specific sized fragments from gel that cover entire DNA sequence
- fragments are ligated to vector after incubating with exonuclease blunt end cutting
- transform vector into E.coli
many e coli clones containing many different DNA fragments -> ready for Sanger sequencing
how can you sequence the inserts?
one end (1x bases) or both ends (2x bases)
put the sequence reads into a computer and the computer will assemble the genome by looking at overlapping reads
these aligned reads (groups of sequences overlapping) are called contigs
less contigs = better bc we cover the genome better and have better assembly of genome
everytime a gene is sequenced you end up with…
gaps between contigs
what is a contig?
not a physical object…
represent a stretch of genome sequence generated from aligned reads on computer
we don’t know what order the contigs are in
where can you find sequence gaps>
- in the cDNA library made (sequencing gap)
- missing sequence is not present in genomic library (physical gap)
How can you fix sequencing gap?
look at genomic library, add primer from library and add nucleotides
How can you fix physical gap?
go back to genomic DNA and design PCR primers pointing into gaps, perform PCR for each possible pair of contigs in every possible direction and see which pair of primers gives a product. if product made, this means gap is filled bc they must be next to each other on the genome
What is illumina sequencing?
- No bacterial cloning required
- DNA amplified on solid surface by PCR-like process
- Billions of simultaneous parallel
- Analysis by electrophoresis not needed
- Automated, real-time detection
disadv: Shorter read lengths
How is illumina sequencing better than sanger?
reactions possible
(Sanger method: hundreds)
sequencing is cheaper and faster
What is the illumina sequencing steps?
- chip genomic DNA into a lot of small pieces and bind to glass surface and to a primer/linker sequence on end
- with every round, you add 1 colour labelled nucleotide
- if the added incubated fluroesenct nucleotide binds to the DNA correctly, after shining a laser to the glass the corresponding color will appear
- this allows you to see what the DNA sequence looks like
How do you figure out what the DNA is encoding for?
find protein coding genes:
design a computer program that will look for start codon, amino acid encoding codons, stop codon
designate as protein coding gene by specifying ~50 AA length in ORF
then look for RBS, promoter and terminator regions
why are there 6 ORF in the computer system?
double stranded DNA and each strand you can start with 1st, 2nd, or 3rd base for each reading frame
What is codon bias?
What is it used for?
different organisms prefer different codons on how to encode AA if AA has multiple codons
this can be used by looking for theses sequences in the computer system and deduce whether it is a protein coding gene