Ch. 9 - Genomics and Systems Biology Flashcards
Genomics
Includes studies of DNA sequences, the organization, function and evolution of genomes.
Sequence Tagged Sites (STSs)
A short sequence (100-500 bp) that is unique within the genome and can easily be detected, usually by PCR. Used as a marker with known localization in the genome/chromosome.
If the sequence comes from a coding region in the DNA, transcribed regions, they are called expressed sequence tags (ESTs, obtained from cDNA).
Sequence tags are mapped relative to each other by analyzing how frequently tags are found together on the same chromosome fragments.
Shotgun sequencing
The genomic DNA is fragmented into smaller pieces (1-10 kb). These fragments are cloned into sequencing vectors and sequenced, often from both ends), yielding 600-1500 bp sequences from each fragment. Computerized searching for overlaps between individual sequences (DNA alignment) then assembles the overlaps into several complete contigs (continuous DNA without gaps) with gaps between them. Using a map of STSs helps in the computations needed to assemble a genome from shotgun sequencing.
Normally you have to produce 6-8 times more total sequence than the size of the genome to get a 99.8% coverage.
Gap filling between DNA contigs
Shotgun sequencing often result in gaps between the sequenced contigs. To identify gaps between contigs, probes or primers that correspond to the ends of the contigs are made depending on further techniques. There are two main approaches:
A new library of clones/contigs is screened with end-of-contig probes from the previous contigs. Clones that hybridize to probes from two sides of a gap are isolated. The sequence of these clones should close the gap between the contigs from the first round.
PCR primers that correspond to the end of the contigs can be used to amplify genomic DNA. If the primer pair is within a few kilobases of each other, a PCR product is made and can be sequenced.
Assembling genomes from cloned fragments
Another alternative to full genome shotgun sequencing is that parts of the genome are cloned into BAC or YAC vectors. Very large genomes may be broken into large fragments that are cloned into BACs or YACs (can contain cloned fragments from 100-1000 kbp), and then shotgun sequenced. Assembling BAC and YAC contigs is much easier than compiling the genome sequence from small fragments.
Assembling repetitive DNA sequences
If the sizes of the sequenced DNA fragments are small, there is a risk that the sequences are assembled incorrectly. These errors occur more often if there are a lot of repetitive DNA (blocks of DNA which are almost identical). One solution is to sequence DNA fragments of different sizes: small fragments (500-1000 bp, 3 kbp, 8 kbp, and 20 kbp. Or one could use ultra long DNA reads (up to 1 million bp) MinION nanophore sequencing.
The Human Genome
The human genome contains about 3.2 billion bp, in which 2.95 of them are euchromatin (available for transcription). Only 1.5-2% of the genome encodes proteins, over 20 000 genes, but over 100 000 proteins. More than half of the genome contains repeated DNA sequences. Most of the RNA is non-coding, mRNA is only 5% of the RNA in a cell.
Sequence polymorphism: SNP and SSLP
DNA polymorphism: Difference in a DNA sequence at a specific locus in the genome. Can be divided into two types: SNPs and SSLPs.
SNP: Single Nucleotide Polymorphism, caused by base changes. A human has in average one SNP for every 1000-2000 bp.
SSLP: Simple Sequence Length Polymorphism, caused by length differences (indels). Refers to DNA regions with tandem repeats that varies between individuals. Includes VNTRs and/or other microsatellites.
Systems biology
The development of powerful “omics”, such as transriptomics, genomics, metabolomics, and proteomics, new ways to analyze such data has emerged. Systems biology tries to integrate data from these technologies using bioinformatic tools. Goal of defining the biological state of an organism within a certain environment.
Metagenomics
Analysis of DNA or RNA from environmental samples, e.g. whole biological communities. 18S or 16S rRNA sequencing.