Ch. 4 - Genomes and DNA Flashcards
Genome
The total genetic information that an organism (whether a living cell or a virus) possesses.
Bacterial genomes are mostly circular with densely packed genes, some of which are grouped into operons (where multiple genes are controlled by one promoter. Genome sizes vary from 0.5 to 12.0 Mega-base pairs (Mbp) and generally encodes 600-6000 proteins.
Viral genomes are smaller (about 0.2 to over 1 Mb) than most bacterial genomes and often are missing key genes for survival as they rely on their host cell to provide these gene products.
Organelle genomes are circular and only have some of the genes necessary for their function within the cell. Genome sizes vary from about 15 kb to 0.2 Mb.
Eukaryotic genomes are large (ranging from ∼10 Mbp in some fungi to >13 000 Mb in lungfishes). Arabidopsis has about 115 Mbp. Humans have about 3.300 Mbp in their genome that encodes for over 20 000 genes. Eukaryotes contain much more intervening or non-coding DNA, and introns.
Operon
A group/cluster of multiple genes in a genome that have their expression controlled by one set of promoter and regulatory regions. Often based upon function of the genes. Mostly found in bacteria.
Genome size and C-value paradox
Genome size, the number of genes in the genome, and the number of chromosomes can vary, and they are independent of each other. Also, none of these variables directly determine/correlate with the complexity of the organism. However, parasitic organisms that rely on others to provide the essentials for life often have relatively smaller genomes than corresponding free-living organisms.
The Symbiotic theory
Proposes that the complex eukaryotic cell arose by a series of symbiotic events in which organisms of different lineages merged. Throughout time, the symbionts lost the ability to survive on their own, and became specialized to provide a specific function for the host. The theory suggests that the organelles og higher organisms (eukaryotic cells) are derived/remnants of ancient symbiotic bacteria. According to the symbiotic theory, mitochondria are derived from ancestral bacteria that specialized in respiration whereas chloroplasts are descended from ancestral photosynthetic bacteria.
Non-coding DNA
DNA that does not code for proteins or functional RNA molecules. Accounts for the majority of the DNA in eukaryotes, especially in higher animals and plants. Explains the C-value paradox: the amount of DNA does not correlate with the number of genes, and the complexity of an organism doe not relate to the amount of DNA in its genome. Regions of non-coding DNA between genes are called intergenic DNA. Non-coding regions that interrupt the coding regions of genes are called intervening sequences, or introns.
Exons and Introns
Exons: Region of the DNA that contain coding information, segment of a gene that codes for protein. Exons are still present in the mRNA after processing is complete. Most eukaryotic genes consists of exons alternating with introns.
Introns: Region of non-coding DNA, segment of a gene that does not code for protein. Introns are transcribed and forms part of the primary transcript. In lower single-celled eukaryotes, introns are relatively rare and often quite short. In higher eukaryotes, introns are often longer than the exons.
Repeated sequences
DNA sequences that are repeated multiple times throughout the genome. Also called repetitive sequences. When the repeated sequences follow each other directly, they are called tandem repeats. When the repeated sequences are spread separately around the genome, they are called interspersed sequences. About 50 % of the human genome are repeated sequences.
Consensus sequence
Idealized base sequence consisting of the bases most often found at each position. Derived by examining and comparing multiple related individual sequences and the frequency of base appearances at each position. The sequence that is the most representative for the series of related sequences compared, is the consensus sequence. Consensus sequences are used to describe many different DNA motifs, including transcription factor binding sites, RNA polymerase binding sites, enhancer elements, DNA binding sites etc. They can also be used to describe conserved protein domains, but instead of using nucleotides, a protein consensus sequence is described by the most common amino acid at each position.
Pseudogenes
A small category of repeats found in eukaryotic cells. Present in only one or two copies, and can be located next to or far away from the original, functional version of the gene. Some pseudogenes are defective duplicates of genuine genes whose defects prevent them from being expressed. Other pseudogenes are expressed, but their mRNA regulates expression of other genes rather than coding for proteins. Account for only a tiny fraction of the DNA.
Moderately repetitive sequences and LINEs
DNA sequences that exist in hundreds or thousands of copies. In the human genome, 25% of the total DNA falls into this category. This includes multiple copies of highly used genes, like those for ribosomal RNA, as well as non-functional stretches of DNA that are repeated many times. In every life form studied to date, rRNA genes are arranged in linear clusters in the genome. These are expressed as polycistronic RNA and then processed into separate rRNAs.
Long INtersperced Elements (LINEs): Long sequence found in multiple copies that makes up much of the moderately repetitive non-coding DNA of mammals. Thought to be derived from retrovirus-like ancestors. A complete LINE-1 (L1) element contains about 7000 bp, although most individual L1 elements are shorter. LINEs are scattered throughout the genome.
Highly repetitive DNA and SINEs
DNA sequences that exist in hundreds of thousands to millions of copies. About 10% of the human DNA.
Short INterspersed Elements (SINEs): Short sequence found in multiple copies that make up much of the highly or moderately repetitive DNA of mammals. These sequences are almost all non-functional as far as is known. The best known SINE is the 300 bp Alu element. About 6-8% of the human DNA consists of repeats of the Alu element. Recent studies suggest that they bind to RNA polymerase II to repress gene transcription. SINEs are scattered throughout the genome.
Satellite DNA or Tandem repeats
Highly repetitive non-coding DNA of eukaryotic cells that is found as long clusters of tandem repeats. Satellite DNA is inert and permanently coiled tightly into heterochromatin. A large proportion of satellite DNA, and therefore heterochromatin is located around the centromeres of the chromosomes in humans, suggesting that it serves some structural role. These repeats are called alpha DNA. The amount of satellite DNA is highly variable.
Unequal crossing over
Long series of tandem repeats tend to misalign when pairs of chromosomes line up for recombination during meiosis. Unequal crossing over will then produce one shorter and one longer segment of repetitive DNA. Thus, the exact number of tandem repeats varies from individual to individual within the same population, and even between chromosomes in a chromosome pair.
Palindromes and Inverted repeats
Palindrome: A sequence that reads the same backwards as forwards. In DNA, which is double stranded, two types of palindromes are theoretically possible.
Mirror-like palindromes: Similar to those of ordinary text. Sequence is the same when read backwards and forwards on the same strand. Involves both strands as they are complementary. If one of the strands are palindromic, the complementary strand must be palindromic too.
ATGCCGTA
TACGGCAT
Inverted repeat: Sequence reads the same forwards on one strand as it reads backwards on the complementary strand. Much more common and of major biological significance. Inverted repeats are extremely important as recognition sites on the DNA for the binding of a variety of proteins. Many regulatory proteins as well as restriction and modification enzymes recognize inverted repeats.
GGATATCC
CCTATAGG
VNTR alleles and DNA fingerprinting
Due to unequal crossing over, the number of repeats in a given VNTR varies among individuals.
Although VNTRs most often are non-coding DNA and not true genes, the different versions of them are referred to as alleles. An allele is a particular version of a gene, or more broadly, a particular version of any locus on a molecule of DNA.
Some hyper-variable VNTRs may have as many as 1000 different alleles and give unique patterns for almost every individual. This quantitative variation may be used for the identification of individuals by DNA fingerprinting
DNA fingerprints are individually unique patterns due to the multiple bands of DNA produced using restriction enzymes, separated by gel electrophoresis, and usually visualized by Southern blotting (or simple dye).