Structure of Genes and the Human Genome Flashcards
Approximately how much of our genome encodes proteins or structural RNAs? What do these include?
30%
- solitary protein coding genes
- duplicated protein coding gene families
- tandemly repeated structural genes (ribosomal RNAs, 5S RNA)
What makes up approximately 50% of the genome?
repetitive DNA
- simple tandem repeats (one after another)
- interspersed repeats - mobile genetic elements
About how much of the genome does other non-repetitive DNA compose?
20%
About how many protein coding genes are there in humans?
23,000
What three components compose the human genome?
- genes encoding proteins or structural RNAs (30%)
- repetitive DNA (50%)
- non-repetitive DNA (20%)
What are solitary protein coding genes?
the expressed regions (exons) are separated by large non-expressed regions (introns)
On average, what percent of the human genome length contributes to mRNA?
5%
Blocks of very short repeated sequences (usually 5-10 nucleotides per repeat), make up approximately what percent of the human genome?
5-10%
What does the largest human gene encode?
the protein called dystrophin
- gene is more than 2.4million nucleotides in length
- contains more than 80 exons
dystrophin normally connects muscle fibers to the cytoskeleton and the cell membrane
*defects in dystrophin cause Becker and Duchenne Muscular Dystrophies
satellite DNA (recognize word, don't care too much)
simple sequence tandem repeats
LINE
- long interspersed element
- remnants of transposable elements (sequences that have the ability to move into and out of genomic DNA)
- often not full length
Insertion of a LINE into a gene can cause what?
genetic disease
Certain hemophilia patients have novel insertions of _____ in the Factor ____ (clotting factor) gene. This mutation was not present in the parental DNA, indicating that it resulted form a recent insertion.
LINES
VIII
SINEs
- short interspersed elements
- most famous/abundant SINE is Alu (about 300 nucleotides long and is related to the 7SL RNA. The 7SL RNA is part of the normal signal recognition particle involved in protein secretion)
- often not full length
On average, every how many kb of genomic DNA is there an Alu sequence?
5-10 kb
*some Alu elements are also mobile and can be inserted into the genome at random locations, causing disease
HERV-W
- retro-viral genome
- human genome contains numerous copies
- one of the viral genes, called syncytin, is active in the trophoblast layer in the human embryo
- syncytin function is essential for implantation of the embryo
How much of the human genome is retroviral related sequences?
8% of human genome is remnants of retroviral related sequences
What is syncytin (a viral gene) necessary for?
essential for implantation of embryo
How have duplicated protein coding genes arisen?
gene families (2+ related copies often located together in genome) arose through duplications that occurred during uneven cross-over (recombination) events
- more recently duplicated genes are more similar in sequence because they’ve had less time to mutate
- it is possible for entire genes to be duplicated, or for individual regions (individual exons) to be duplicated within a single gene
It is possible for individual regions (individual exons) to be duplicated within a single gene. What is an example of this?
collagen
What are pseudogenes?
- genes that have been duplicated and then lost their function
- not transcribed into mRNA
- often mutations in regulatory regions that control transcription
- once the gene no longer makes functional protein, it is no longer under any selective pressure and mutations rapidly accumulate
tandemly repeated structural RNA genes
- these genes have exactly the same sequence and exactly the same orientation
- usually high numbers of copies
Interspersed repeats
- longer repeated sequences found scattered through genome (not in tandem arrays like simple repeats)
- not identical, but similar
- common interspersed repeats are:
- LINE elements (also called L1 sequences)
- SINE elements
- inactive remnants of retroviral genomes
short tandem repeats (STRs)
- sometimes called VNTRs (variable number of tandem repeats)
- usually repeats of 3 or 4 nucleotides that occur either within genes or at various other locations in the genome
- the basis of DNA fingerprinting (most people differ in the repeat number at a specific location
- contribute to a number of different human diseases, including Huntington disease and Fragile-X syndrome
Huntington Disease
- progressive neurodegenerative disease
- triplet repeat region in the coding region of the Huntington disease gene (called Huntington)
- most people have between 1 and 13 modules of the sequence CAG encoding a short polymer of the amino acid, glutamine
- in diseased people, 30 to over 100 repeats; the more copies, the earlier the onset of the disease
- the poly-glutamine stretch causes aggregates of Huntington protein, which interferes with normal function of the cell
Fragile-X syndrome
- major cause of mental retardation in males (1 in 6000 births)
- sequence CGG is repeated approx 60x in the 5’ untranslated region of the Fragile-X gene
- people with disease have more than 200 repeats
- the sequence CGG can be methylated, leading to reduction of transcription of the gene, so that effectively no fragile-X protein is synthesized
other non-repetitive sequences
remainder of genome is comprised of unique (non-repeated) sequences
- there can be enormous sequences of DNA between functional genes and much of this is considered to be ‘spacer’ DNA, with no obvious function
(ex: around lysozyme)