topic 23 Flashcards

1
Q

when creating a DNA library for genome sequencing, why would it not be suitable to digest the genome to completion with a restriction enzyme?

A

if the genome were digested to completion, then the fragments in the genomic library would not overlap with each other. there would be no way to assemble the sequenced fragments into a complete genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the steps in genome sequencing? (5)

A
  1. create a genomic DNA library (not necessary anymore)
  2. generate many independent sequencing reads
  3. align independent reads into contiguous sequences (contigs)
  4. fill gaps between contigs
  5. annotate genomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the strategies to sequence a genome? (2)

A

consecutively

random fragments simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

describe the method in which you would sequence a genome consecutively. its advantage? disadvantage?

A

you would get the sequence of one section of the genome. if we sequence a 500 bp section of the genome, we could then use this known sequence to design a sequencing primer that would extend from the known fragment into the neighbouring unknown region. this would allow us to sequence the next 500 bp stretch of the genome. this process could be repeated for the rest of the genome.

the advantage of this method is that each step on the way, you would know where there sequence fragment fits in the genome.

however, this process is very time consuming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

describe the method in which you would sequence a genome in random fragments. its advantage? disadvantage?

A

the fastest strategy is to sequence random sequences of the genome, all at the same time.

because the location of each sequenced part is random, you would need to generate a lot of sequences to give you a good chance of obtaining any specific spot on the genome. you also have to figure out how the pieces fit together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the genomic library should be large enough for what?

A

the genomic library should be large enough that all sequences from the genome appear at least once (and preferably multiple times)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe the steps in the process of creating a genomic library (6)

A
  1. to create a genomic library (which contains random fragments of the genome), you need to grow a culture of the source cells and isolate the genomic DNA from the cells.
  2. then, generate smaller fragments of the genome. this can be done by digesting with a restriction enzyme that will give fragments of the appropriate size (up to here can now be forgone thanks to technology). the DNA fragments could be ran through an agarose gel and then extract fragments between a certain size range.
  3. these fragments would then be ligated into a vector to create a collection (genomic library) of circular plasmids, each plasmid containing a circular piece of DNA. this library could be transformed into an appropriate host, such as E coli, resulting in approximately 20,000 clones, each with a different genomic insert.
  4. the plasmid DNA was then sequenced by the Sanger sequencing method. parts of the fragments could be sequenced to create a large number of independent sequencing reads, each representing about 500 bases of the genome.
  5. the computer aligns the sequence runs into longer stretches of DNA called contigs. **contig isn’t a physical object, it’s a data file
  6. without a reference genome, one wouldn’t know the order of contigs because many combinations are possible. the gaps between contigs can be a few bases or many thousands. these contigs now have to be linked together in a more direct sequencing strategy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how would the contigs be linked when creating a genomic library? (3) describe these processes

A
  1. if the missing piece (gap) is small and present in the genomic library, primers can be designed to sequence the middle of an insert. a sequence gap is present in the genomic library
  2. if the gap is large or not in the genomic library, a 2nd genomic library could be generated using another vector. you find a clone (with the help of an oligonucleotide) containing sequence from both contigs, then you can find the missing sequence as shown for a sequence gap above.

another way to bridge a physical gap is to design PCR primers that are complementary to the ends of the contigs and that point into the gap. you then perform PCR with pairs of these primers, using genomic DNA as the template. If the primers are adjacent to each other in the genome, then a PCR product will be generated. that PCR product can be sequenced to fill in the physical gap (physical gap means the missing sequence isn’t present in the genomic library)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

describe the illumina sequencing method. its advantages? disadvantages? when would you use illumina sequencing?

A

illumina sequencing is a method that gets around some of the limitations present in the Sanger sequencing method. DNA is amplified on the solid surface by PCR like processes, but less DNA is required. unlike the Sanger sequencing method where only hundreds of reactions are possible, the illumina sequencing could have billions of simultaneous, parallel reactions. no analysis by electrophoresis is needed as it’s automated.

the disadvantage is that it has shorter reading lengths, thus it could be troublesome when assembling contigs.

however, it’s cheaper and faster. illumina sequencing can analyze large number of DNA strands at the same time, which is why it’s said to be massively parallel.

illumina sequencing is good when you already have a sequencing genome and you’re looking for mutations in a specific genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

describe the process of illumina sequencing (2)

A

a mixture of DNA fragments is deposited on a solid surface so the individual fragments are located in random places on the surface.

then, copies of these fragments are made in a process that resembles PCR. end up with clusters of DNA strands. all of strands in the cluster are identical but each cluster is different. illumina sequencing can analyze large number of DNA strands at the same time, which is why it’s said to be massively parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are examples of vectors that can be used?

A

artificial chromosomes (can handle larger fragments of several hundred kilobases) or bacteriophage (viruses that infect bacteria)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why must a genome be annotated?

A

a genome must be annotated to help one understand the information contained in the genome - its functional organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when performing a genome annotation, what are the areas of interest?

A

open reading frames (ORFs) encoding proteins

genes encoding RNA (tRNA and mRNA)

transcriptional regulatory regions

origins of replication

telomeres

repeat sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the requirements of an ORF that encodes proteins? (6)

A
  1. usually contain at least 100 codons. the longer an ORF extends without a stop codon, the more likely it is that the ORF encodes a protein.
  2. show codon bias. not all codons are used with the same frequency in mRNA. each organism has a codon bias.
  3. are preceded by a promoter and have associated with regulatory elements. for mRNA to be transcribed, there must be a promoter nearby. in general, promoters are more regular and easier to find in bacteria than eukaryotes due to the lack of introns in prokaryotes.
  4. are related to genes in other organisms. because of the evolutionary relatedness of different organisms, it’s possible to identify similar genes (ORFs?) in other organisms.
  5. produce an mRNA. if an ORF encodes a protein, the presence of mRNA should be detected. **mRNA may only be expressed under certain conditions or only in certain cell types of a multi-cellular organisms.
  6. produce a protein. the presence of protein may only be expressed under certain conditions/cell type.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

describe BLAST

A

BLAST (basic local alignment search tool), a software that allows one to compare a possible genome/protein sequence with nucleotide/amino acid sequences of known gene/proteins in other organisms.

the program will look for similarities and report back the highest matches it finds. the higher a number, the more related it is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly