9 - Genomics and System Biology Flashcards
Large-scale mapping with seq tags
RFLPs = restriction fragment length polymorphism STS = seq tagged sites EST = expressed seq tas
Large genomes cannot be genetically marked using only genes (such as VNTRs, RFLPs, etc) because they are too big, so they need to use STSs and ESTs
EST = specialized STS derived from a region that is trc into mRNA. Amplified by PCR using cDNA. can get several ESTs from the same gene, as one EST only represents a small section of a gene. To avoid such duplications, ESTs are usually derived from the 3’ UTR.
to asseble a genome, the relative location of each STS/EST is needed.
Mapping is performed by determining how often two different STSs are found on the same DNA fragment (fragment may be derived from a single chromosome or the whole genome). The closer the two fragments are, the more likely they will end up on the same DNA fragment.
The chromosome fragments to be examined were originally derived by cloning large segments of DNA into high capacity vectors like YACs, but these ofteh contain multiple segments of DNA from different locations. Consequently, radioation hybrid cells were used for STS mapping.
radiation hybrid = cell that contains fragments of chromosomes from another species.
How to make a radioation hybrid 101:
1) irradiate cultured human cells with X- or gamma-rays, killing the cells and fragmenting the chromosomes. The dying human cells are then fused with rodent cells. The resulting hybrid cells contain random fragments from the human chromosomes. the hybric cells are then screened to see which STSs are together.
Shotgun sequencing
approach in which the genome is broken into many random short fragments for sequencing. The complete genome seq is the assembled by computerized searching for overlaps between individual sequences
The sequencing techs learned in chap 8 describes the seq of fragments of several hundred bp length. These need to be assembled together to get the while genome.
in shotgun sequencing, the genome is broken randomly into short fragments (1-2kbp long) that are ligated into a suitable vector and partially sequenced. 4-500 bp can typically be generated from each fragment in a single sequencing run. Computerized seaching for overlaps between the sequences are then assebled into contigs, and the contigs together make up the whole genome
contig = stretch of DNA that is contigous and and lacks gaps.
The gaps between the contigs must be closed. The easiest way is to re-screen the original set of clones with pairs of probes corresponding to seqon the two sides on each gap. clones that hybridize bo both members of such a pair of probes presumably carry the DNA that bridges the gap.
Survey of the human genome
the function of a gene must usually be researched by experiments, not just from the sequence.
Sequence polymorphisms: SSLPs and SNPs
polymorphism = a difference in DNA seq between two relaed individual organisms. two types: changes in seq/nts and different lengths.
SSLP = simple sequence length polymorphism = any DNA region consisting of randem repeats that vary in number from individual to individual, including VNTRs, microsatellites, and other tandem repeats.
SNP = single nt polymorphism = a difference in DNA seq of a single base change between two individuals
gene identification by exon trapping
exon trapping = experimental procedure for isolating exons by using their flanking splice recombinaiton site that are used in RNA processing.
during exon trapping, the DNA to be analyzed must first be cloned to a special vactor that can replicate in both E coli and in suitable animal cells. The vector carries an artificial mini-gene consisting of just two exons and an intervening intron, together with a promoter and poly(A) tail rec site. the intron contains a multiple cloning site for cloning lengths of unknown DNA.
The pSPL cecotrs use a simian virus 40 (SV40) origin of replication as well as an SV40 promoter and tail site for the mini-gene. These vectors can replicate in modified monkey cells (COS cells) that contain a defective SV40 genome integrated into a host genome
DNA containing the exons to be trapped is cut into segments using an appropriate RE. These segments are inserted into the multiple cloning site within the intron on the pSPL vector. The plasmid is then trnasformed into the COS monkey cells where the mini-gene is trc into a primary transcript and spliced. If an extra exon is in the middle of the mini-gene, it will be present in the spliced mRNA, which will therefore be longer. To isolate the trapped exon, the mRNA is converted to cDNA and then PCR is used to amplify the region containing the trapped exon. Thus, exon trapping can be used even if the DNA seq is unknown, althogh in this case we will not know the order of the exons in the original DNA. Exon trapping is the best used in conjunction with seq analysis to identify and order the exons within the analyzed DNA seq.
The evolution of “junk” DNA
“junk” DNA is trc - suggesting that there is a function.
copy nomber variations (CNVs) = a form of structural variation in which oe genome will have either an insertion or deletion relative to another genome from a different individual
Large variation of “junk” DNA between humans
may play a role in genetic diseases such as cancer.
Pharmacogenomics
Pharmacogenomics = a field of study that correlates individual genotupes relative to the persons reaction to a pharmaceutical agent
pharmacogenetics = studying the particular genes that affect how a person reacts to drugs
Patients who posess low activity alleles metabolize the corresponding drugs much more slowly and are consequently more likely to show toxic side effects because the drug accumulates in their system. On the other hand, a too effective metabolism would not give the drug time to work. Individual SNP analysis would thus be useful for dosaging.
Bioinformatics and computer analysis
bioinformatics = computerized analysis of large amounts of biological sequence data
data ining = the use of computer analysis to find useful information by filtering or sifting through large amounts of data
Genome mining = the use of computer analysis to find useful information by filtering or sifting through large amounts of biological sequence data
Stages of genome mining:
1) selection of the data of interest
2) pre-processing or “data cleansing”. Unnecessary info is removed to avoid slowing or clogging the analysis
3) Transformation of the data into a format convenient for analysis
4) extraction of patterns and relationships from the data
5) interpretation and evaluation
Other analysis that can be performed on DNA seq:
1) Searching for related sequences. compare sequence to a data bank. can be used to find function or to study evolution
2) Codon bias analysis can locate coding regions. there are differences in codon frwuency between coding and noncoding DNA. Can give a reasonable first estimate of whether a sequence is coding or not
3) searching for known consensus seq. promoters, RBS, terminators, other regulatory proteins.
Transcriptome = the total sum of the RNA transcripts found in a cell under any particular set of conditions
proteome = the total set of proteins encoded by an organism or the total protein complement of an organism
Systems biology
Systems biology = refers to the integration of many different types of research on an organism with the goal of defining the biological state of an organism within a certain environment
metabolome = the total complement of small molecules and metabolic intermediates of a cell or organism
Metagenomics and community sampling
metagenomics = the genome level study of whole biological communities
DNA or RNA is isolated from an envorinment and analyzed without isolating which organism it belongs to.
most often applied to microorganisms.
the human microbiome
microbiome = the total complement of microoranisms found in a particular habitat.
varies between individuals, contries
shows the effects of diet, health, age and behaviour.
microbiome composition can effect obesity
there are variations between identical twins, but their biomes are more similar than between unrelated individuals