Chapter 5: Genome Annotation Flashcards

Question

# Module 5 Limitations of homology analysis

Answer 1

* does not necessarily imply the same function * may not reveal function- e.g. orphans (do not know function of gene) * suggests related function * provides a starting point for hypothesis and research

Answer 2

* genes * intergenic regions * they have no equivalents in the related genomes

Answer 3

* the conservation of blocks of order within two sets of chromosomes that are being compared with each other. * gene order is conserved * greater in more closely related species * facilitates the identification of genes

Answer 4

* Compare an ORF sequence with a nucleotide sequence database * electronically translate the ORF and use the polypeptide sequence to search a protein sequence database * Based on the concept that gene sequences are conserved, true exons will often have related sequences in a databas * As more genes added to a database, more likely to find a related sequence

Answer 5

* A gene related to a second gene by descent from a common ancestral DNA sequence * A morphological structure in one species related to that in a second species by descent from a common ancestral structure. * may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog)

Answer 6

* gene copies created by a duplication event within the same genome * While orthologous genes kept the same function, paralogous genes often develop different functions due to missing selective pressure on one copy of the duplicated gene

Answer 7

* gene copies created by a duplication event within the same genome * While orthologous genes kept the same function, paralogous genes often develop different functions due to missing selective pressure on one copy of the duplicated gene

Answer 8

* DNA * RNA

Answer 9

* upstream and downstream untranslated regions (UTRs) * a precise definition of the start and end of the coding region of a gene * exon–intron

Answer 10

* hybridization analysis * designed for the mapping of individual genes onto short sequences of DNA

Answer 11

northern hybridization

Answer 12

* a means of determining the number of genes present in a DNA fragment and the size of each coding region * An total RNA extract is electrophoresed under denaturing conditions in an agarose gel * After ethidium bromide staining, two bands are seen * These are the two largest rRNA molecules (abundant) * The smaller tRNAs are not seen because they are so short that they run out the bottom of the gel (abundant) * in most cells, none of the mRNAs is abundant enough to form a band visible after ethidium bromide staining * The gel is blotted onto a nylon membrane and probed with a radioactively labeled DNA fragment * A single band is visible on the autoradiograph, showing that the DNA fragment used as the probe contains part or all of one transcribed sequence.

Answer 13

* Some individual genes give rise to two or more transcripts of different lengths because some of their exons are optional and may or may not be retained in the mature RNA. If this is the case, then a fragment that contains just one gene could detect two or more hybridizing bands in the northern blot. A similar problem can occur if the gene is a member of a multigene family * With many species, it is not practical to make an mRNA preparation from an entire organism, so the extract is obtained from a single organ or tissue. Any genes not expressed in that organ or tissue will not be represented in the RNA population and so will not be detected when the RNA is probed with the DNA fragment being studied. Even if the whole organism is used, not all genes will give hybridization signals, because many are expressed only at a particular developmental stage and others are weakly expressed, meaning that their RNA products are present in amounts too low to be detected by hybridization analysis

Answer 14

* designed for the mapping of individual genes onto short sequences of DNA * avoids the problems with poorly expressed and tissue-specific genes by searching not for RNAs but for related sequences in the DNAs of other organisms * Samples of different species DNAs are prepared, restricted, and electrophoresed in an agarose gel * Southern hybridization is carried out with a human DNA fragment as the probe * A positive hybridization signal is seen with each of the animal DNAs, suggesting that the human DNA fragment contains an expressed gene. * Different size hybridizing restriction fragments between species indicates that the restriction map around the transcribed sequence is different in those species but does not affect the conclusion that a homologous gene is present in all four species.

Answer 15

* distinct functional and/or structural units in a protein * responsible for a particular function or interaction * contributing to the overall role of a protein * may exist in a variety of biological contexts, where similar domains can be found in proteins with different functions

Answer 16

positioned within that fragment with any degree of accuracy

Answer 17

* Genes identified by mutational/polymorphic phenotypes * Mapping is used to identify the location of the gene * Molecular methods are used to pinpoint & identify the gene

Answer 18

* All genes are accessible- not only those that produce chance phenotypes * eliminate genes to discover its function

Answer 19

* designed for the mapping of individual genes onto short sequences of DNA * PCR that uses RNA rather than DNA as the starting material. * one of the primers is specific for an internal region of the gene close to its start site * primer attaches to the mRNA and reverse transcriptase synthesizes cDNA * the 3ʹ-end of this cDNA corresponding exactly with the 5ʹ-end of the mRNA and is extended by terminal deoxynucleotidyl transferase to give a short poly(A) tail * mRNA/DNA is denature to single-stranded cDNA * The second primer anneals to this poly(A) sequence on single-stranded cDNA and, during the first round of the normal PCR, converts the single-stranded cDNA into a double-stranded molecule * this double stranded DNA is amplified with PCR and its sequence reveals the position of the start of the transcript.

Answer 20

* designed for the mapping of individual genes onto short sequences of DNA * requires a single-stranded version of the gene * obtained by cloning in a vector based on M13 bacteriophage * replication of M13 involves synthesis of phage particles that contain a single-stranded copy of the phage genome so cloned DNA n an M13 vector can be obtained as single-stranded DNA * mixed with RNA preparation, the transcribed sequence in the single-stranded DNA hybridizes with the equivalent mRNA, forming a double-stranded heteroduplex * a restriction fragment spanning the start of an mRNA has been cloned * Some of the cloned fragment participates in the heteroduplex, but the rest does not * single-stranded regions are digested by treatment with a singlestrand-specific nuclease such as S1 * The size of the heteroduplex is determined by degrading the RNA component with alkali and electrophoresing the resulting single-stranded DNA in an agarose or polyacrylamide gel * size is used to position the start of the transcript relative to the restriction site at the end of the cloned fragment.

Answer 21

* exon–intron boundaries * exon–intron boundary

Answer 22

* designed for the mapping of individual genes onto short sequences of DNA * method for finding exons in a genome sequence * a special type of vector that contains a minigene consisting of two exons flanking an intron sequence * the two exon sequences are preceded by a promoter sequence required for gene expression in a eukaryotic host * to use the vector, the piece of DNA to be studied is inserted into a restriction site located within the vector’s intron region * The vector is then introduced into a suitable eukaryotic cell line, where it is transcribed and the RNA produced from it is spliced * any exon contained in the genomic fragment becomes attached between the upstream and downstream exons from the minigene * RT-PCR with primers annealing within the two minigene exons is now used to amplify a DNA fragment, which is sequenced * since the minigene sequence is already known, the nucleotide positions at which the inserted exon starts and ends can be determined, precisely delineating this exon.

Answer 23

* type of DNA chip * many different oligonucleotides are immobilized in an ordered array * oligonucleotides form a series that covers the length of a chromosome sequence or the sequence of an entire genome * oligonucleotides either overlap or have small gaps between adjacent oligonucleotides * tiling array is hybridized to a labeled sample of RNA from the organism whose genome is being annotated * Those positions on the array that hybridize to the RNA samples, reveal the positions in the genome of transcribed sequences

Answer 24

* each oligonucleotide was synthesized separately and then spotted onto the chip at its appropriate position * suitable for preparing low-density arrays for typing a relatively small number of SNPs

Answer 25

* oligonucleotides must be synthesized directly on the surface of the chip * modified nucleotide substrates are used * are light-activated before they attach to the end of a growing oligonucleotide * nucleotides are added one after another to the chip surface, with photolithography used to direct pulses of light onto individual positions in the array * Only the oligonucleotides that are light-activated will be extended by the nucleotide that is present * enables the highest density arrays, with up to 300,000 oligonucleotides per square centimeter, to be prepared.

Answer 26

1. The transcript might be longer than the gene, so oligonucleotides whose positions lie in the upstream and downstream UTRs will also give signals. 2. The accuracy of mapping depends on the lengths of the oligonucleotides and of the overlaps or gaps between them in the array. In the examples of array design shown in Figure 5.16, the accuracy is ±30 for the overlapping array and ±70 nucleotides for the gapped array.

Answer 27

* not all genes are expressed in a single organ or cell type, and even in one cell type the gene expression pattern varies over time.

Answer 28

* fractionated * poly(A) tail

Answer 29

* affinity chromatography * polyadenylated mRNA

Answer 30

* the frequency of the desired cDNAs in the library * the completeness of the individual cDNA molecules

Answer 31

* use cDNA capture or cDNA selection to enrich the library for the desired clones * BAC fragment is repeatedly hybridized to the pool of cDNAs, with nonhybridized cDNAs washed away and discarded * Because the cDNA pool contains so many different sequences, it is generally not possible to discard all the irrelevant clones by these repeated hybridizations, but it is possible to increase significantly the frequency of those clones that specifically hybridize to the DNA fragment * This reduces the size of the library that must subsequently be screened under stringent conditions to identify the desired clones.

Answer 32

* always a chance that one or other of the strand-synthesis reactions will not proceed to completion, resulting in a truncated cDNA * presence of intramolecular base pairs in the RNA can also lead to incomplete copying * sequences of truncated cDNAs can be used to locate genes in a DNA sequence, but they may lack the sequences needed to delineate the start and end points of the gene or the exact positions of exon–intron boundaries.

Answer 33

* (A) Direct mapping of reads onto the genome sequence. * (B) Initial assembly of RNA-seq contigs, followed by mapping of the contigs onto the genome.

Answer 34

* RNA-seq is simply the application of Illumina or some other high-throughput sequencing method to a library that has been prepared from cDNA rather than directly from DNA. * sequence reads correspond to segments of the transcripts in the original RNA sample * reads can be mapped directly onto a genome sequence like the way in which DNA sequence reads are mapped onto a reference genome during a genome resequencing project * the difference is that the RNA reads do not give lengthy scaffolds but instead form clusters that map specifically onto the transcribed parts of the genome

Answer 35

* apply a de novo assembly method to the collection of RNA-seq reads and then map the assembled contigs onto the reference genome * advantage is that many genes are members of multigene families displaying sequence similarity * If individual, short RNA reads are mapped directly onto the reference genome, then some might be identical to segments of two or more members of a multigene family, complicating the mapping process * If, on the other hand, the complete transcript sequence is determined prior to mapping, then the members of a gene family are easily distinguished.

Answer 36

* software package that enables genome annotation data to be displayed in a graphical format * DNA sequence forming the x-axis and the positions of genes and other interesting features marked at their appropriate map positions

Answer 37

Databases for the curation of DNA sequences

Answer 38

* The problem of subtle phenotypes resulting from gene KO posses the problem of whether it is more efficient to assess each KO against all phenotypes * Or is it possible to assess all gene KOs against one phenotype at a time. This method can be accomplished using a barcode system

Answer 39

where and when genes are expressed

Answer 40

* gene introduced into a cell, especially a bacterium or to cells in culture, that confers a trait suitable for artificial selection * type of reporter gene used to indicate the success of a transfection or other procedure meant to introduce foreign DNA into a cell.

Answer 41

* methodology used to study proteomes * a collection of diverse techniques that are related only in their ability to provide information on a proteome * encompassing not only the identities of the constituent proteins that are present but also factors such as the functions of individual proteins and their localization within the cell

Answer 42

The particular technique that is used to study the composition of a proteome

Answer 43

its constituent proteins

Answer 44

* is the standard method for separating the proteins in a mixture * the composition of gel and the conditions under which it is carried out, different chemical and physical properties of proteins can be used as the basis for their separation * sodium dodecyl sulfate which denatures proteins and confers a negative charge that is roughly equivalent to the length of the unfolded polypeptide * proteins separate according to their molecular masses * mallest proteins migrating more quickly towards the positive electrode

Answer 45

* contains chemicals which establish a pH gradient when the electrical charge is applied * In this type of gel, a protein migrates to its isoelectric point, the position in the gradient where its net charge is zero

Answer 46

* not all proteins in the proteome will be visible in the gel * in particular, proteins that are not soluble in an aqueous buffer, such as many of the proteins present in cell membranes, will be absent * special buffers and gel compositions must be used * several parallel experiments must be carried out if the objective is to study a proteome in its entirety * problems with the reproducibility * difficulty in devising control procedures that enable the data from such gels to be normalized when two proteomes are compared * alternative separation methods are being sought

Answer 47

* different amounts * a single two-dimensional gel * two separate gels * one containing normal hydrogen atoms and the other containing deuterium heavy isotope of hydrogen * normal and heavy versions can be distinguished by mass spectrometry * enabling the relative amounts of a protein in two proteomes that have been mixed together to be determined during the MALDI-TOF stage of the profiling procedure

Answer 48

* gene sequence * exon–intron boundaries * position * alternative splicing pathways

Answer 49

* identify which protein is present in a spot found in the result of a twodimensional gel electrophoresis * works best with peptides of up to 50 amino acids in length, longers ones need to be broken down * process * purify the protein from a spot * digest it with a sequence-specific protease, such as trypsin, which cleaves proteins immediately after arginine or lysine residues * usually this results in a series of peptides 5–75 amino acids in length * Once ionized, the mass-to-charge ratio of a peptide is determined from its “time-of-flight” within the mass spectrometer as it passes from the ionization source to the detector * mass-to-charge ratio enables the molecular mass to be worked out, which in turn allows the amino acid composition of the peptide to be deduced

Chapter 5: Genome Annotation Flashcards

(74 cards)