Chapter 21: Genomics, Bioinformatics, and Proteomics Flashcards
Which program in the Human Genome Project was designed to ensure that personal genetic information would not be used in discriminatory ways?
ELSI
The Ethical, Legal, and Social Implication program was put into place to study social implications arising from the Human Genome Project.
Which of the following is a characteristic of the human genome?
Larger and more intron-rich genes than in genomes of invertebrates
Both gene size and intron content tend to increase with complexity of the organism.
The Human Genome Project, which got under way in 1990, is an international effort to ________.
construct a physical map of the billions of base pairs in the human genome
Compared with eukaryotic chromosomes, bacterial chromosomes are ________.
small, with high gene density
The analysis of proteins and enzymatic pathways in cells is known as ________.
metabolomics
This is the analysis of proteins and enzymatic pathways involved in cell metabolism.
Compared with prokaryotic chromosomes, eukaryotic chromosomes are ________.
large, linear, less densely packed with protein-coding genes, mainly organized in
single gene units with introns
Most of the bacterial genomes described in the text have fewer than ________.
10,000 genes
The study of genomic data collected from environmental samples is called ________.
metagenomics
In metagenomics, genomes from entire communities of microorganisms are sequenced. These samples are collected from environmental samples of air, water, and earth.
The human genome contains approximately 20,000 protein-coding genes, yet it has the capacity to produce several hundred thousand gene products. What can account for the vast difference in gene number and product number?
Alternative splicing occurs.
Proteomics is the ________.
process of defining the complete set of proteins encoded by a genome
One major difference between prokaryotic and eukaryotic genes is that eukaryotic genes can contain internal sequences, called ________, that get removed in the mature message.
introns
How does shotgun cloning differ from the clone-by-clone method?
No genetic or physical maps of the genome are needed to begin shotgun cloning.
Shotgun cloning randomly sequences clones with no prior knowledge of their location in the genome.
Understanding why the chromosome is broken into fragments
Sequencing machines cannot analyze sequences that are more than about 800–1,000 bases long. Therefore, the chromosome must bebroken into fragments before any sequencing can take place.
DNA cloning using plasmids.
A typical DNA sequencing reaction requires about 1 microgram of DNA, so the amplification of DNA through cloning is a crucial step inshotgun sequencing.One type of DNA cloning involves plasmids. A plasmid is a small, circular DNA molecule found in bacteria in addition to the bacterialchromosome. Each time a bacterium reproduces, it replicates each of its plasmids.
To clone DNA using plasmids, molecular biologists insert DNA fragments into plasmids and then introduce the plasmids into bacteria.Because bacteria reproduce so rapidly, they can make more than a million copies of a DNA fragment in less than 24 hours.
Why is overlap between the fragment sequences important?
Why must the fragment sequences overlap?
Overlap enables the computer to match up the fragments and determine how they fit together.
Steps in shotgun sequencing:
What are the steps in the shotgun approach to whole-genome sequencing?
1) multiple copies of the same chromosome are prepared
2) Chromosome copies are broken into 1-kb fragments
3) 1 kb fragments are cloned into plasmids
4) the plasmids are sequenced
5) A computer combines the fragment sequences.
There is no use for RNA (1-kb fragments are transcribed into RNA) in the Sanger method = full genome sequencing i.e. the shotgun approach.
In shotgun sequencing, the DNA from many copies of an entire chromosome is cut into fragments.
The fragments are inserted into plasmids and cloned in bacteria. Plasmid DNA is isolated from the bacteria, purified, and sequenced. Finally, a computer assembles the fragment sequences into the continuous sequence of the whole chromosome, based on overlap between the fragments.
Assembling a complete sequence from fragment sequences
In the last step of shotgun sequencing, a computer analyzes a large number of fragment sequences to determine the DNA sequence of a whole chromosome. Given the following fragment sequences, what is the overall DNA sequence?
Sequences of DNA fragments GATGAC CGATGCG GGCGTCAG GACATGGC TCAGTCGA
The five fragment sequences can be arranged to form the complete sequence:
Fragment GATGAC
Fragment GACATGGC
Fragment GGCGTCAG
Fragment TCAGTCGA
Fragment CGATGCG
Complete sequence GATGACATGGCGTCAGTCGATGCG
In shotgun sequencing, a computer program takes millions of bases into consideration when determining the sequence of an entire chromosome. The program arranges the fragment sequences so there is a maximum amount of overlap.
The dog (Canis familiaris) genome has recently been sequenced. About what percentage of the dog’s genes are shared with humans?
75 %
A number of generalizations can be made about the organization of protein-coding genes in bacterial chromosomes. First, the gene density is very high, averaging about ________ gene per _____ basepairs of DNA.
A number of generalizations can be made about the organization of protein-coding genes in bacterial chromosomes. First, the gene density is very high, averaging about 1 gene per 1,000 basepairs of DNA.
Assembling a contig from short reads
In whole-genome shotgun sequencing, computers are used to assemble short DNA sequences (short reads) into an overlapping, contiguous sequence (contig). In this part of the tutorial, you will manually assemble a contig from seven short reads.
Correct
You have just done on a short DNA segment what computers do on an entire genome in whole-genome shotgun sequencing – build a contig by finding overlaps between short reads. The sequence of the contig is
AAGACCCGCCGGGAGGCAGAGGACCTGCAGGG
TGAGCCAACCGCCCATTGCT
In whole-genome shotgun sequencing, the genomic DNA must be prepared so that there are overlaps between the short reads. This can be done either by partial restriction digestion or randomized DNA shearing. If there are highly repetitive sequences or gaps in the contigs, map-based sequencing can be used to fill in the gaps and determine the correct number of repetitive sequences.
In which section of the search results can you find nucleotide-by-nucleotide comparisons between your query sequence and similar database sequences?
Alignments
The three sections of the search results page provide different information about the hit sequences from the database.
Understand the statistical significance of your hit sequences
One of the key features of BLAST is that it permits you to make quantitative comparisons of sequence similarities (alignments) between your query sequence and every other sequence in the database. The quantity that is most frequently reported in a statistical comparison of sequence alignments is the E value (expectation value). The E value is the probability that by chance there is another sequence with a better alignment to your query sequence than that particular hit.
Scroll to the Descriptions section of your results and examine the E values for your hits.
How do the E values change as you go from the top of the list of hits to the bottom?
The E values get larger.
The most similar sequences to your query sequence have E values of about 2 × 10-54 (written as 2e-54 in the search results). This means that there is almost no possibility of finding a better sequence alignment by chance. As you scroll down the list of hits, the E values get slightly larger (smaller negative exponents). These slightly larger E values indicate less statistically significant sequence alignments. In other words, the hits near the top of the list are the most statistically significant. And hits with equal E values are statistically equivalent.
In general, a good alignment has an E value of 1 × 10-5 or smaller.
Whole-genome shotgun sequencing has largely replaced map-based sequencing as a faster, cheaper method of sequencing full genomes.
Nevertheless, map-based cloning remains useful for areas of highly repetitive DNA because it can organize such DNA into a physical map before sequencing it.
Consequently, modern genome sequencing projects typically employ a combination of shotgun and map-based methods.
Both methods involve fragmenting genomes into overlapping segments and using the areas of overlap to assemble the segments into contiguous sequences, or contigs.
Once the complete sequence is determined, genomes are annotated to identify open reading frames, introns, exons, and other sequences.