Chapter 21: Genomics, Bioinformatics, and Proteomics Flashcards
Which program in the Human Genome Project was designed to ensure that personal genetic information would not be used in discriminatory ways?
ELSI
The Ethical, Legal, and Social Implication program was put into place to study social implications arising from the Human Genome Project.
Which of the following is a characteristic of the human genome?
Larger and more intron-rich genes than in genomes of invertebrates
Both gene size and intron content tend to increase with complexity of the organism.
The Human Genome Project, which got under way in 1990, is an international effort to ________.
construct a physical map of the billions of base pairs in the human genome
Compared with eukaryotic chromosomes, bacterial chromosomes are ________.
small, with high gene density
The analysis of proteins and enzymatic pathways in cells is known as ________.
metabolomics
This is the analysis of proteins and enzymatic pathways involved in cell metabolism.
Compared with prokaryotic chromosomes, eukaryotic chromosomes are ________.
large, linear, less densely packed with protein-coding genes, mainly organized in
single gene units with introns
Most of the bacterial genomes described in the text have fewer than ________.
10,000 genes
The study of genomic data collected from environmental samples is called ________.
metagenomics
In metagenomics, genomes from entire communities of microorganisms are sequenced. These samples are collected from environmental samples of air, water, and earth.
The human genome contains approximately 20,000 protein-coding genes, yet it has the capacity to produce several hundred thousand gene products. What can account for the vast difference in gene number and product number?
Alternative splicing occurs.
Proteomics is the ________.
process of defining the complete set of proteins encoded by a genome
One major difference between prokaryotic and eukaryotic genes is that eukaryotic genes can contain internal sequences, called ________, that get removed in the mature message.
introns
How does shotgun cloning differ from the clone-by-clone method?
No genetic or physical maps of the genome are needed to begin shotgun cloning.
Shotgun cloning randomly sequences clones with no prior knowledge of their location in the genome.
Understanding why the chromosome is broken into fragments
Sequencing machines cannot analyze sequences that are more than about 800–1,000 bases long. Therefore, the chromosome must bebroken into fragments before any sequencing can take place.
DNA cloning using plasmids.
A typical DNA sequencing reaction requires about 1 microgram of DNA, so the amplification of DNA through cloning is a crucial step inshotgun sequencing.One type of DNA cloning involves plasmids. A plasmid is a small, circular DNA molecule found in bacteria in addition to the bacterialchromosome. Each time a bacterium reproduces, it replicates each of its plasmids.
To clone DNA using plasmids, molecular biologists insert DNA fragments into plasmids and then introduce the plasmids into bacteria.Because bacteria reproduce so rapidly, they can make more than a million copies of a DNA fragment in less than 24 hours.
Why is overlap between the fragment sequences important?
Why must the fragment sequences overlap?
Overlap enables the computer to match up the fragments and determine how they fit together.
Steps in shotgun sequencing:
What are the steps in the shotgun approach to whole-genome sequencing?
1) multiple copies of the same chromosome are prepared
2) Chromosome copies are broken into 1-kb fragments
3) 1 kb fragments are cloned into plasmids
4) the plasmids are sequenced
5) A computer combines the fragment sequences.
There is no use for RNA (1-kb fragments are transcribed into RNA) in the Sanger method = full genome sequencing i.e. the shotgun approach.
In shotgun sequencing, the DNA from many copies of an entire chromosome is cut into fragments.
The fragments are inserted into plasmids and cloned in bacteria. Plasmid DNA is isolated from the bacteria, purified, and sequenced. Finally, a computer assembles the fragment sequences into the continuous sequence of the whole chromosome, based on overlap between the fragments.
Assembling a complete sequence from fragment sequences
In the last step of shotgun sequencing, a computer analyzes a large number of fragment sequences to determine the DNA sequence of a whole chromosome. Given the following fragment sequences, what is the overall DNA sequence?
Sequences of DNA fragments GATGAC CGATGCG GGCGTCAG GACATGGC TCAGTCGA
The five fragment sequences can be arranged to form the complete sequence:
Fragment GATGAC
Fragment GACATGGC
Fragment GGCGTCAG
Fragment TCAGTCGA
Fragment CGATGCG
Complete sequence GATGACATGGCGTCAGTCGATGCG
In shotgun sequencing, a computer program takes millions of bases into consideration when determining the sequence of an entire chromosome. The program arranges the fragment sequences so there is a maximum amount of overlap.
The dog (Canis familiaris) genome has recently been sequenced. About what percentage of the dog’s genes are shared with humans?
75 %
A number of generalizations can be made about the organization of protein-coding genes in bacterial chromosomes. First, the gene density is very high, averaging about ________ gene per _____ basepairs of DNA.
A number of generalizations can be made about the organization of protein-coding genes in bacterial chromosomes. First, the gene density is very high, averaging about 1 gene per 1,000 basepairs of DNA.
Assembling a contig from short reads
In whole-genome shotgun sequencing, computers are used to assemble short DNA sequences (short reads) into an overlapping, contiguous sequence (contig). In this part of the tutorial, you will manually assemble a contig from seven short reads.
Correct
You have just done on a short DNA segment what computers do on an entire genome in whole-genome shotgun sequencing – build a contig by finding overlaps between short reads. The sequence of the contig is
AAGACCCGCCGGGAGGCAGAGGACCTGCAGGG
TGAGCCAACCGCCCATTGCT
In whole-genome shotgun sequencing, the genomic DNA must be prepared so that there are overlaps between the short reads. This can be done either by partial restriction digestion or randomized DNA shearing. If there are highly repetitive sequences or gaps in the contigs, map-based sequencing can be used to fill in the gaps and determine the correct number of repetitive sequences.
In which section of the search results can you find nucleotide-by-nucleotide comparisons between your query sequence and similar database sequences?
Alignments
The three sections of the search results page provide different information about the hit sequences from the database.
Understand the statistical significance of your hit sequences
One of the key features of BLAST is that it permits you to make quantitative comparisons of sequence similarities (alignments) between your query sequence and every other sequence in the database. The quantity that is most frequently reported in a statistical comparison of sequence alignments is the E value (expectation value). The E value is the probability that by chance there is another sequence with a better alignment to your query sequence than that particular hit.
Scroll to the Descriptions section of your results and examine the E values for your hits.
How do the E values change as you go from the top of the list of hits to the bottom?
The E values get larger.
The most similar sequences to your query sequence have E values of about 2 × 10-54 (written as 2e-54 in the search results). This means that there is almost no possibility of finding a better sequence alignment by chance. As you scroll down the list of hits, the E values get slightly larger (smaller negative exponents). These slightly larger E values indicate less statistically significant sequence alignments. In other words, the hits near the top of the list are the most statistically significant. And hits with equal E values are statistically equivalent.
In general, a good alignment has an E value of 1 × 10-5 or smaller.
Whole-genome shotgun sequencing has largely replaced map-based sequencing as a faster, cheaper method of sequencing full genomes.
Nevertheless, map-based cloning remains useful for areas of highly repetitive DNA because it can organize such DNA into a physical map before sequencing it.
Consequently, modern genome sequencing projects typically employ a combination of shotgun and map-based methods.
Both methods involve fragmenting genomes into overlapping segments and using the areas of overlap to assemble the segments into contiguous sequences, or contigs.
Once the complete sequence is determined, genomes are annotated to identify open reading frames, introns, exons, and other sequences.
proteomics
The study of the expressed proteins present in a cell at a given time.
comparative genomic hybridization (CGH)
A microarray-based method for the analysis of copy number variations in genomic DNA or in specific cell types,such as tumor cells.
YAC
A cloning vector in the form of a yeast artificial chromosome, constructed using chromosomal components including telomeres (from a ciliate), and centromeres, origin of replication,and marker genes from yeast.
YACs are used to clone long stretches of eukaryotic DNA.
Previous genome analysis
Used model systems
Screen for natural & induced mutants
Map studies
Linkage analysis to map genes
Required at least 1 mutant/ gene to find
Studies difficult to perform
Labor intensive
Some mutants lethal, will not find the genes associated with those
Genomics
Move to molecular methods (1980s), away from classical methods
Genomic library clones pieced together
Clones sequenced
Genome Sequencing Methods:
Clone by Clone
In the clone-by-clone method, a genomic library is prepared, and clones are organized into genetic and physical maps by observing the inheritance pattern of genetic markers in heterozygous families.
After the clones are arranged into physical maps, they are broken into smaller, overlapping clones that cover each chromosome.
Each smaller clone is sequenced, and the genomic sequence is assembled by stringing together the nucleotide sequence of the clones.
Genome Sequencing Methods:
Shotgun Method
In the shotgun method, a genomic library is constructed from fragments of genomic DNA.
Clones are selected from the library at random, and sequenced.
The sequence is assembled by looking for sequence overlaps between clones from different libraries. This is usually done by computer, using assembler software designed for genomic analysis.
NCBI
National Center for Biotechnology Information
Repository of Sequence/Annotation data
Genome sequence databases
Protein databases
Bioinformatic tools, eg BLAST for sequence similarity searches
bioinformatics
A field that focuses on the design and use of software and computational methods for the storage, analysis, and management of biological information such as nucleotide or amino acid sequences.
Bioinformatics
Analyze and store vast amounts of data
Visualize data
Access data
Data mining
Many companies have developed bioinformatic software
Prokaryotic Genomes:
Eubacteria genomes
Genome sizes vary
Most circular, but not all
Importance of plasmids, some essential:
When is a plasmid a chromosome?
Gene density is high, little “wasted genome”
Not all operons contain genes from same biochemical pathway, which was unexpected
Overlapping genes in eubacteria, also unexpected
The E. coli genome.
The origin and terminus of replication.
The outer circle of bars represents genes transcribed in a clockwise direction, and the inner circle represents genes transcribed in a counterclockwise direction.
Vibrio cholerae genome
The Vibrio cholerae genome is contained in 2 chromosomes.
The larger chromosome (chromosome 1) contains most of the genes for essential cellular functions and infectivity.
Most of the genes on chromosome 2 (52 percent of 115) are of unknown function.
The bias in gene content and the presence of plasmidlike sequences on chromosome 2 suggest that this chromosome was a megaplasmid captured by an ancestral Vibrio species.