23: Sequencing genome Flashcards

1
Q

What are the steps of genome sequencing?

A
  1. Create genomic DNA library
  2. Generate many independent sequencing reads
  3. Align independent reads into contiguous sequences (“contigs”)
  4. Fill gaps between contigs
  5. Annotate genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is shotgun sequencing?

A

random sequence fragments sequenced simultaneously then try to assemble the sequence in order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you Create genomic DNA library?

A
  1. extract DNA from influenza cells
  2. sonicate through ultrasound shear DNA apart making a lot of small DNA fragments
  3. separate these fragments on gel electrophoresis
  4. cut out specific sized fragments from gel that cover entire DNA sequence
  5. fragments are ligated to vector after incubating with exonuclease blunt end cutting
  6. transform vector into E.coli

many e coli clones containing many different DNA fragments -> ready for Sanger sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how can you sequence the inserts?

A

one end (1x bases) or both ends (2x bases)

put the sequence reads into a computer and the computer will assemble the genome by looking at overlapping reads

these aligned reads (groups of sequences overlapping) are called contigs

less contigs = better bc we cover the genome better and have better assembly of genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

everytime a gene is sequenced you end up with…

A

gaps between contigs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a contig?

A

not a physical object…

represent a stretch of genome sequence generated from aligned reads on computer

we don’t know what order the contigs are in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

where can you find sequence gaps>

A
  1. in the cDNA library made (sequencing gap)
  2. missing sequence is not present in genomic library (physical gap)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you fix sequencing gap?

A

look at genomic library, add primer from library and add nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you fix physical gap?

A

go back to genomic DNA and design PCR primers pointing into gaps, perform PCR for each possible pair of contigs in every possible direction and see which pair of primers gives a product. if product made, this means gap is filled bc they must be next to each other on the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is illumina sequencing?

A
  • No bacterial cloning required
  • DNA amplified on solid surface by PCR-like process
  • Billions of simultaneous parallel
  • Analysis by electrophoresis not needed
  • Automated, real-time detection

disadv: Shorter read lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is illumina sequencing better than sanger?

A

reactions possible
(Sanger method: hundreds)

sequencing is cheaper and faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the illumina sequencing steps?

A
  1. chip genomic DNA into a lot of small pieces and bind to glass surface and to a primer/linker sequence on end
  2. with every round, you add 1 colour labelled nucleotide
  3. if the added incubated fluroesenct nucleotide binds to the DNA correctly, after shining a laser to the glass the corresponding color will appear
  4. this allows you to see what the DNA sequence looks like
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you figure out what the DNA is encoding for?

A

find protein coding genes:
design a computer program that will look for start codon, amino acid encoding codons, stop codon

designate as protein coding gene by specifying ~50 AA length in ORF

then look for RBS, promoter and terminator regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

why are there 6 ORF in the computer system?

A

double stranded DNA and each strand you can start with 1st, 2nd, or 3rd base for each reading frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is codon bias?
What is it used for?

A

different organisms prefer different codons on how to encode AA if AA has multiple codons

this can be used by looking for theses sequences in the computer system and deduce whether it is a protein coding gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Once you have determined that it is a protein coding sequence from the computer system, what do you do? How do you do this?

A

go back to databases and look at other organisms that already have this protein coding sequence (homology) – then figure out what the gene’s codes for and what the protein’s function is

this homology comparison is done through BLAST

17
Q

What is BLAST?

A

Basic Local Alignment Search Tool

runs collected gene sequence data against all known sequences in database

can input protein AA sequence or nucleotide sequence

can also do multiple sequence alignment to determine the important regions of protein that is found in all organisms (FOXP2 gene, ex) - highly conserved = important = mutation is likely pathogenic

18
Q

What can you do after BLAST?

A

after determining the function of the gene, you can verify the DNA by seeing if it makes the mRNA with a northern blot

produce protein and check by western blotting or mass spectrometry to see if it is in the cell