Chapter 5: Genome Annotation Flashcards

1
Q

Module 5

genome annotation

A

the process
by which genes are located in a genome sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Module 5

Once an assembled genome sequence has been obtained, various methods can be employed to locate the genes that are present. These methods can be divided into

A
  • those that involve simply inspecting the sequence, by eye or more frequently by computer
  • those methods that locate genes by experimental analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Module 5

open reading frames (ORFs)

A
  • Genes that code for proteins
  • consisting of a series of codons that specify the amino acid sequence of the protein that the gene codes for
  • begins with an initiation codon, usually (but not always) ATG
  • ends with a termination codon, either TAA, TAG, or TGA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Module 5

ORF scanning or ab initio gene prediction

A
  • involves searching a DNA sequence for ORFs that begin with an ATG and end with a termination triplet
  • complicated by the fact that each DNA sequence has six reading frames, three in one direction and three in the reverse direction on the complementary strand
  • Each strand has three reading frames, depending on which nucleotide is chosen as the starting position.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Module 5

  1. The key to the success of ORF scanning is the frequency with which _____ _____ appear in the DNA sequence.
  2. If the DNA has a random sequence and a GC content of 50%, then each of the three termination codons will appear, on average, once every _____.
  3. If the GC content is greater than 50%, then the termination codons, being AT-rich, will occur less frequently, but one will still be expected every ______.
  4. This means that random DNA should not show many ORFs longer than _____ codons in length, especially if the presence of a starting ATG triplet is used as part of the definition of an ORF.
  5. Most genes, on the other hand, are longer than 50 codons: the average lengths are _____ codons for bacterial genes and approximately _____ codons for humans.
A
  1. termination codons: TAA, TAG, or TGA
  2. 43 = 64 bp
  3. 100–200 bp
  4. 50
    • 300–350
    • 450
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ORF scanning, in its simplest form, therefore takes a figure of, say, _____ codons as the shortest length of a putative gene and records positive hits for all ORFs _____ than this.

A
  • 100
  • longer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Although ORF scans work well for bacterial genomes, they are less effective for locating genes in DNA sequences from higher eukaryotes. This is partly because

A

there is substantially more space between the real genes in a eukaryotic genome (for example, approximately 62% of the human genome is intergenic),

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Module 5

  • main problem with the human genome, and the genomes of higher eukaryotes in general, is that their genes are often split by _____ and so do not appear as continuous ORFs in the DNA sequence.
  • Many _____ are shorter than 100 codons, some consisting of fewer than 50 codons, and continuing the reading frame into an intron usually leads to a termination sequence that appears to close the ORF
  • Intron boundaries are marked by _____ _____
A
  • introns
  • exons
  • consensus sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Module 5

initiation codon

A

ATG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Module 5

termination codon

A

TAA, TAG, or TGA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Module 5

Codon bias

A
  • not all codons are used with equal frequency in the genes of a particular organism
  • all organisms have a bias, which is different in different species
  • The codon bias of the organism being studied is therefore written into the ORF-scanning software
  • i.e. leucine is most frequently coded by CTG and is only rarely specified by TTA or CTA
  • the frequency with which a particular organism uses the available CODONS in genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Module 5

consensus sequence

A

the sequence shows the most frequent nucleotide at each position in all of the upstream exon–intron boundaries that are known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • Exon–intron boundaries can be searched for, as these have distinctive sequence features, via _____ _____
  • The sequence of the upstream exon–intron boundary is usually described as 5ʹ-AG↓GTAAGT-3ʹ with the arrow indicating the precise _____ _____.
  • only the GT immediately after the arrow is invariable: elsewhere in the sequence, nucleotides other than the ones shown are quite often found
  • The downstream intron–exon boundary is even less well defined: 5ʹ-PyPyPyPyPyPyNCAG↓-3ʹ, where Py and N means
A
  • consensus sequence
  • boundary point
  • one of the pyrimidine nucleotides (T or C) and N is any nucleotide
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Module 5

  • Upstream regulatory sequences can be used to locate the regions where genes _____.
  • These regulatory sequences, like exon–intron boundaries, have _____ sequence features that they possess in order to carry out their role as recognition signals for the DNA-binding proteins involved in gene expression.
  • As with exon–intron boundaries, the regulatory sequences are ____, more so in eukaryotes than in prokaryotes, and in eukaryotes not all genes have the same collection of regulatory sequences. Using these to locate genes is therefore problematic
A
  • begin
  • distinctive
  • variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Module 5

CpG islands

A
  • upstream of many genes
  • sequences of approximately 1 kb in which the GC content is greater than the average for the genome as a whole
  • Some 40–50% of human genes are associated with an upstream CpG island
  • distinctive, and when one is located in vertebrate DNA, a strong assumption can be made that a gene begins in the region immediately downstream
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the genome is completely unstudied, then the accuracy of gene prediction will be lower, even though most gene prediction software includes

A

a machine learning function, so the computer becomes trained to recognize appropriate patterns of codon usage as it gradually builds up the genome annotation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Module 5

genes for noncoding RNAs such as rRNA and tRNA do not comprise _____ ______ _____ and hence will not be located by Codon bias, Exon-intron boundaries or Upstream regulatory sequences

A
  • open reading frames
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Module 5

intramolecular base pairing

A
  • pattern that can occur in single-stranded DNA or, more commonly, in RNA
  • It occurs when two regions of the same strand, usually complementary in nucleotide sequence when read in opposite directions, base-pair to form a secondary structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Module 5

Noncoding RNA molecules have distinctive features, which can be used as an aid in their discovery in a genome sequence. The most important of these features is the ability to

A
  • fold into a secondary structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Module 5

Other noncoding RNA genes are less easy to locate because the RNAs take up structures that involve relatively little base pairing or the base pairing is not in a regular pattern. Three scanning approaches are used for location of the genes for these RNAs:

A
  1. scan DNA sequences for stem-loops (hair pins) therefore identify regions where noncoding RNA genes might be present
  2. scan for regulatory sequences associated with genes for noncoding RNAs
  3. In compact genomes, attention is directed toward regions that remain after a comprehensive search for protein-coding genes. Often these empty spaces are not empty at all, and a careful examination will reveal the presence of one or more noncoding RNA genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Module 5

homologous

A
  • derived by descent
  • having the same relation, relative position, or structure
  • similarity due to shared ancestry between a pair of structures or genes in different taxa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Module 5

homology

A

sequence conservation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Module 5

homology search

A
  • DNA databases are searched to compare the test sequence with genes that have already been sequenced
  • If the test sequence is part of a gene that has already been sequenced by someone else, then an identical match will be found
  • intention is to determine if an entirely new sequence is similar to any known genes
  • looking for a chance that the test and match sequences are homologous
  • to assign functions to newly discovered genes
  • central to gene prediction because it enables the authenticity of tentative exon sequences located by ORF scanning to be tested
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
  • With homology search, if the tentative exon sequence gives one or more positive matches after a homology search then it is
  • but if it gives no match then
A
  • probably a real exon
  • its authenticity must remain in doubt until it is assessed by one or other of the experiment-based genome annotation techniques.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Module 5

Limitations of homology analysis

A
  • does not necessarily imply the same function
  • may not reveal function- e.g. orphans (do not know function of gene)
  • suggests related function
  • provides a starting point for hypothesis and research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Module 5

  • Because of natural selection, the sequence similarities between related genomes are greatest within the _____ and lowest in the _____ _____.
  • Therefore, when related genomes are compared, homologous genes are easily identified because they have high sequence similarity, and any ORF that does not have a clear homolog in the second genome can be discounted as almost certainly being a chance sequence and not a genuine gene because
A
  • genes
  • intergenic regions
  • they have no equivalents in the related genomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Module 5

synteny

A
  • the conservation of blocks of order within two sets of chromosomes that are being compared with each other.
  • gene order is conserved
  • greater in more closely related species
  • facilitates the identification of genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Module 5

How can homology be used to identify genes

A
  • Compare an ORF sequence with a nucleotide sequence database
  • electronically translate the ORF and use the polypeptide sequence to search a protein sequence database
  • Based on the concept that gene sequences are conserved, true exons will often have related sequences in a databas
  • As more genes added to a database, more likely to find a related sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Module 5

homolog

A
  • A gene related to a second gene by descent from a common ancestral DNA sequence
  • A morphological structure in one species related to that in a second species by descent from a common ancestral structure.
  • may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Module 5

Ortholog

A
  • gene copies created by a duplication event within the same genome
  • While orthologous genes kept the same function, paralogous genes often develop different functions due to missing selective pressure on one copy of the duplicated gene
31
Q

Module 5

Paralog

A
  • gene copies created by a duplication event within the same genome
  • While orthologous genes kept the same function, paralogous genes often develop different functions due to missing selective pressure on one copy of the duplicated gene
32
Q

Module 5

_____ sequences are more reliable indicators of homology

A

protein

33
Q

The second approach to genome annotation makes use of experimental techniques to locate genes within a genome sequence. These methods are not usually based on direct examination of _____ molecules but instead rely on detection of the _____ molecules that are transcribed from genes.

A
  • DNA
  • RNA
34
Q

Module 5

  • One problem w/transcript is they’re usually longer than the coding part of the gene because ….
  • Because of this, transcript analysis does not give …
  • but it does tell you that a gene is present in a particular region and it can locate the ______-______ boundaries. Often this is sufficient information to enable the coding region to be delineated.
A
  • upstream and downstream untranslated regions (UTRs)
  • a precise definition of the start and end of the coding region of a gene
  • exon–intron
35
Q

Module 5

The simplest procedures for studying transcribed sequences are based on _____ _____

A
  • hybridization analysis
  • designed for the mapping of individual genes onto short sequences of DNA
36
Q

Module 6

RNA molecules can be separated by specialized forms of agarose gel electrophoresis, transferred to a nitrocellulose or nylon membrane, and examined by the process called _____ ______. designed for the mapping of individual genes onto short sequences of DNA

A

northern hybridization

37
Q

Module 6

northern hybridization

A
  • a means of determining the number of genes present in a DNA fragment and the size of each coding region
  • An total RNA extract is electrophoresed under denaturing conditions in an agarose gel
  • After ethidium bromide staining, two bands are seen
  • These are the two largest rRNA molecules (abundant)
  • The smaller tRNAs are not seen because they are so short that they run out the bottom of the gel (abundant)
  • in most cells, none of the mRNAs is abundant enough to form a band visible after ethidium bromide staining
  • The gel is blotted onto a nylon membrane and probed with a radioactively labeled DNA fragment
  • A single band is visible on the autoradiograph, showing that the DNA fragment used as the probe contains part or all of one transcribed sequence.
38
Q

Module 6

northern hybridization weaknesses

A
  • Some individual genes give rise to two or more transcripts of different lengths because some of their exons are optional and may or may not be retained in the mature RNA. If this is the case, then a fragment that contains just one gene could detect two or more hybridizing bands in the northern blot. A similar problem can occur if the gene is a member of a multigene family
  • With many species, it is not practical to make an mRNA preparation from an entire organism, so the extract is obtained from a single organ or tissue. Any genes not expressed in that organ or tissue will not be represented in the RNA population and so will not be detected when the RNA is probed with the DNA fragment being studied. Even if the whole organism is used, not all genes will give hybridization signals, because many are expressed only at a particular developmental stage and others are weakly expressed, meaning that their RNA products are present in amounts too low to be detected by hybridization analysis
39
Q

Module 5

zoo-blotting

A
  • designed for the mapping of individual genes onto short sequences of DNA
  • avoids the problems with poorly expressed and tissue-specific genes by searching not for RNAs but for related sequences in the DNAs of other organisms
  • Samples of different species DNAs are prepared, restricted, and electrophoresed in an agarose gel
  • Southern hybridization is carried out with a human DNA fragment as the probe
  • A positive hybridization signal is seen with each of the animal DNAs, suggesting that the human DNA fragment contains an expressed gene.
  • Different size hybridizing restriction fragments between species indicates that the restriction map around the transcribed sequence is different in those species but does not affect the conclusion that a homologous gene is present in all four species.
40
Q

Module 5

protein domains

A
  • distinct functional and/or structural units in a protein
  • responsible for a particular function or interaction
  • contributing to the overall role of a protein
  • may exist in a variety of biological contexts, where similar domains can be found in proteins with different functions
41
Q

Module 5

Northern hybridization and zoo-blotting can identify a DNA fragment that contains a gene, but those methods do not enable the gene to be

A

positioned within that fragment with any degree of accuracy

42
Q

Module 5

Hallmarks of Forward Genetics

A
  • Genes identified by mutational/polymorphic phenotypes
  • Mapping is used to identify the location of the gene
  • Molecular methods are used to pinpoint & identify the gene
43
Q

Module 5

Hallmarks of Reverse Genetics

A
  • All genes are accessible- not only those that produce chance phenotypes
  • eliminate genes to discover its function
44
Q

reverse transcriptase polymerase chain reaction

rapid amplification of cDNA ends (RACE)

A
  • designed for the mapping of individual genes onto short sequences of DNA
  • PCR that uses RNA rather than DNA as the starting material.
  • one of the primers is specific for an internal region of the gene close to its start site
  • primer attaches to the mRNA and reverse transcriptase synthesizes cDNA
  • the 3ʹ-end of this cDNA corresponding exactly with the 5ʹ-end of the mRNA and is extended by terminal deoxynucleotidyl transferase to give a short poly(A) tail
  • mRNA/DNA is denature to single-stranded cDNA
  • The second primer anneals to this poly(A) sequence on single-stranded cDNA and, during the first round of the normal PCR, converts the single-stranded cDNA into a double-stranded molecule
  • this double stranded DNA is amplified with PCR and its sequence reveals the position of the start of the transcript.
45
Q

heteroduplex analysis

A
  • designed for the mapping of individual genes onto short sequences of DNA
  • requires a single-stranded version of the gene
  • obtained by cloning in a vector based on M13 bacteriophage
    • replication of M13 involves synthesis of phage particles that contain a single-stranded copy of the phage genome so cloned DNA n an M13 vector can be obtained as single-stranded DNA
  • mixed with RNA preparation, the transcribed sequence in the single-stranded DNA hybridizes with the equivalent mRNA, forming a double-stranded heteroduplex
  • a restriction fragment spanning the start of an mRNA has been cloned
  • Some of the cloned fragment participates in the heteroduplex, but the rest does not
  • single-stranded regions are digested by treatment with a singlestrand-specific nuclease such as S1
  • The size of the heteroduplex is determined by degrading the RNA component with alkali and electrophoresing the resulting single-stranded DNA in an agarose or polyacrylamide gel
  • size is used to position the start of the transcript relative to the restriction site at the end of the cloned fragment.
46
Q

Heteroduplex analysis can also be used to locate _____-______ ______ . The method is almost the same as transcript mapping with the exception that the cloned restriction fragment spans the _____-______ ______ being mapped rather than the start of the transcript

A
  • exon–intron boundaries
  • exon–intron boundary
47
Q

exon trapping

A
  • designed for the mapping of individual genes onto short sequences of DNA
  • method for finding exons in a genome sequence
  • a special type of vector that contains a minigene consisting of two exons flanking an intron sequence
  • the two exon sequences are preceded by a promoter sequence required for gene expression in a eukaryotic host
  • to use the vector, the piece of DNA to be studied is inserted into a restriction site located within the vector’s intron region
  • The vector is then introduced into a suitable eukaryotic cell line, where it is transcribed and the RNA produced from it is spliced
  • any exon contained in the genomic fragment becomes attached between the upstream and downstream exons from the minigene
  • RT-PCR with primers annealing within the two minigene exons is now used to amplify a DNA fragment, which is sequenced
  • since the minigene sequence is already known, the nucleotide positions at which the inserted exon starts and ends can be determined, precisely delineating this exon.
48
Q

Module 6

tiling arrays

A
  • type of DNA chip
  • many different oligonucleotides are immobilized in an ordered array
  • oligonucleotides form a series that covers the length of a chromosome sequence or the sequence of an entire genome
  • oligonucleotides either overlap or have small gaps between adjacent oligonucleotides
  • tiling array is hybridized to a labeled sample of RNA from the organism whose genome is being annotated
  • Those positions on the array that hybridize to the RNA samples, reveal the positions in the genome of transcribed sequences
49
Q

Module 6

tiling arrays

oligonucleotide placement on chip

low-density arrays

A
  • each oligonucleotide was synthesized separately and then spotted onto the chip at its appropriate position
  • suitable for preparing low-density arrays for typing a relatively small number of SNPs
50
Q

Module 6

tiling arrays

oligonucleotide placement on chip

low-density arrays

A
  • oligonucleotides must be synthesized directly on the surface of the chip
  • modified nucleotide substrates are used
  • are light-activated before they attach to the end of a growing oligonucleotide
  • nucleotides are added one after another to the chip surface, with photolithography used to direct pulses of light onto individual positions in the array
  • Only the oligonucleotides that are light-activated will be extended by the nucleotide that is present
  • enables the highest density arrays, with up to 300,000 oligonucleotides per square centimeter, to be prepared.
51
Q

Module 6

tiling arrays

The hybridization data will not accurately locate each gene, for two reasons. Describe them.

A
  1. The transcript might be longer than the gene, so oligonucleotides whose positions lie in the upstream and downstream UTRs will also give signals.
  2. The accuracy of mapping depends on the lengths of the oligonucleotides and of the overlaps or gaps between them in the array. In the examples of array design shown in Figure 5.16, the accuracy is ±30 for the overlapping array and ±70 nucleotides for the gapped array.
52
Q

Module 6

tiling arrays

The source of the RNA preparation also has to be chosen with care to ensure that the sample contains transcripts of as many genes as possible. With a higher eukaryote such as humans, it is very difficult, if not impossible, to obtain a sample that contains transcripts of every gene in the genome, because

A
  • not all genes are expressed in a single organ or cell type, and even in one cell type the gene expression pattern varies over time.
53
Q

Module 6

tiling arrays

The RNA that is prepared can also be _____ in various ways so that only certain types of genes are targeted by the tiling array. The most commonly used fractionation procedure is to preselect RNAs that carry a _____ ______ at their 3ʹ-ends

A
  • fractionated
  • poly(A) tail
54
Q

Module 6

tiling arrays

An _____ _____ column containing oligo(dT)–cellulose (cellulose beads to which short oligonucleotides of thymidine have been attached) can therefore be used to purify the _____ ______ fraction from a eukaryotic RNA sample

A
  • affinity chromatography
  • polyadenylated mRNA
55
Q

The utility of cDNA library sequencing depends on two factors.

A
  • the frequency of the desired cDNAs in the library
  • the completeness of the individual cDNA molecules
56
Q

The utility of cDNA library sequencing depends on two factors:

the frequency of the desired cDNAs in the library

A
  • use cDNA capture or cDNA selection to enrich the library for the desired clones
  • BAC fragment is repeatedly hybridized to the pool of cDNAs, with nonhybridized cDNAs washed away and discarded
  • Because the cDNA pool contains so many different sequences, it is generally not possible to discard all the irrelevant clones by these repeated hybridizations, but it is possible to increase significantly the frequency of those clones that specifically hybridize to the DNA fragment
  • This reduces the size of the library that must subsequently be screened under stringent conditions to identify the desired clones.
57
Q

The utility of cDNA library sequencing depends on two factors:

the completeness of the individual cDNA molecules

A
  • always a chance that one or other of the strand-synthesis reactions will not proceed to completion, resulting in a truncated cDNA
  • presence of intramolecular base pairs in the RNA can also lead to incomplete copying
  • sequences of truncated cDNAs can be used to locate genes in a DNA sequence, but they may lack the sequences needed to delineate the start and end points of the gene or the exact positions of exon–intron boundaries.
58
Q

name two approaches to mapping rnA-seq reads onto a reference genome

A
  • (A) Direct mapping of reads onto the genome sequence.
  • (B) Initial assembly of RNA-seq contigs, followed by mapping of the contigs onto the genome.
59
Q

name two approaches to mapping RNA-seq reads onto a reference genome

Direct mapping of reads onto the genome sequence.

A
  • RNA-seq is simply the application of Illumina or some other high-throughput sequencing method to a library that has been prepared from cDNA rather than directly from DNA.
  • sequence reads correspond to segments of the transcripts in the original RNA sample
  • reads can be mapped directly onto a genome sequence like the way in which DNA sequence reads are mapped onto a reference genome during a genome resequencing project
  • the difference is that the RNA reads do not give lengthy scaffolds but instead form clusters that map specifically onto the transcribed parts of the genome
60
Q

name two approaches to mapping RNA-seq reads onto a reference genome

Initial assembly of RNA-seq contigs, followed by mapping of the contigs onto the genome.

A
  • apply a de novo assembly method to the collection of RNA-seq reads and then map the assembled contigs onto the reference genome
  • advantage is that many genes are members of multigene families displaying sequence similarity
  • If individual, short RNA reads are mapped directly onto the reference genome, then some might be identical to segments of two or more members of a multigene family, complicating the mapping process
  • If, on the other hand, the complete transcript sequence is determined prior to mapping, then the members of a gene family are easily distinguished.
61
Q

genome browser

A
  • software package that enables genome annotation data to be displayed in a graphical format
  • DNA sequence forming the x-axis and the positions of genes and other interesting features marked at their appropriate map positions
62
Q

GenBank

A

Databases for the curation of DNA sequences

63
Q

Module 5

One gene vs many phenotypes or many genes vs one phenotype

A
  • The problem of subtle phenotypes resulting from gene KO posses the problem of whether it is more efficient to assess each KO against all phenotypes
  • Or is it possible to assess all gene KOs against one phenotype at a time. This method can be accomplished using a barcode system
64
Q

Module 5

Reporter genes are used to assess

A

where and when genes are expressed

65
Q

Module 5

selectable marker

A
  • gene introduced into a cell, especially a bacterium or to cells in culture, that confers a trait suitable for artificial selection
  • type of reporter gene used to indicate the success of a transfection or other procedure meant to introduce foreign DNA into a cell.
66
Q

Module 6

proteomics

A
  • methodology used to study proteomes
  • a collection of diverse techniques that are related only in their ability to provide information on a proteome
  • encompassing not only the identities of the constituent proteins that are present but also factors such as the functions of individual proteins and their localization within the cell
67
Q

Module 6

protein profiling or expression proteomics

A

The particular technique that is used to study the composition of a proteome

68
Q

Module 6

In order to characterize a proteome, it is first necessary to prepare pure samples of

A

its constituent proteins

69
Q

Module 6

Polyacrylamide gel electrophoresis

A
  • is the standard method for separating the proteins in a mixture
  • the composition of gel and the conditions under which it is carried out, different chemical and physical properties of proteins can be used as the basis for their separation
  • sodium dodecyl sulfate which denatures proteins and confers a negative charge that is roughly equivalent to the length of the unfolded polypeptide
  • proteins separate according to their molecular masses
  • mallest proteins migrating more quickly towards the positive electrode
70
Q

Module 6

isoelectric focusing

A
  • contains chemicals which establish a pH gradient when the electrical charge is applied
  • In this type of gel, a protein migrates to its isoelectric point, the position in the gradient where its net charge is zero
71
Q

Module 6

twodimensional gel electrophoresis limitation

A
  • not all proteins in the proteome will be visible in the gel
  • in particular, proteins that are not soluble in an aqueous buffer, such as many of the proteins present in cell membranes, will be absent
  • special buffers and gel compositions must be used
  • several parallel experiments must be carried out if the objective is to study a proteome in its entirety
  • problems with the reproducibility
  • difficulty in devising control procedures that enable the data from such gels to be normalized when two proteomes are compared
  • alternative separation methods are being sought
72
Q

Module 6

matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF)

  • If two proteomes are being compared then a key requirement is that proteins that are present in ____ _____can be identified
  • This can be done by labeling the constituents of two proteomes with different fluorescent markers, and then run them together in a _____ ______-______ _____
  • Visualization of the two-dimensional gel at different wavelengths enables the intensities of equivalent spots to be judged more accurately than is possible when ____ ____ ____ are obtained
  • A more accurate alternative is to label each proteome with an isotope coded affinity tag (ICAT). These markers can be obtained in two forms:
A
  • different amounts
  • a single two-dimensional gel
  • two separate gels
  • one containing normal hydrogen atoms and the other containing deuterium heavy isotope of hydrogen
    • normal and heavy versions can be distinguished by mass spectrometry
    • enabling the relative amounts of a protein in two proteomes that have been mixed together to be determined during the MALDI-TOF stage of the profiling procedure
73
Q

Module 6

matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF)

The amino acid compositions of the peptides derived from a single protein can also be used to check that the ____ _____ is correct, and in particular to ensure that _____-_____ ______ have been correctly located. This not only helps to delineate the _____ of a gene in a genome it also allows ____ _____ _____ to be identified

A
  • gene sequence
  • exon–intron boundaries
  • position
  • alternative splicing pathways
74
Q

Module 6

matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF)

A
  • identify which protein is present in a spot found in the result of a twodimensional gel electrophoresis
  • works best with peptides of up to 50 amino acids in length, longers ones need to be broken down
  • process
    • purify the protein from a spot
    • digest it with a sequence-specific protease, such as trypsin, which cleaves proteins immediately after arginine or lysine residues
    • usually this results in a series of peptides 5–75 amino acids in length
    • Once ionized, the mass-to-charge ratio of a peptide is determined from its “time-of-flight” within the mass spectrometer as it passes from the ionization source to the detector
    • mass-to-charge ratio enables the molecular mass to be worked out, which in turn allows the amino acid composition of the peptide to be deduced