Moderately repetitive fraction of the human genome Flashcards
What is the major contributor to this fraction?
Multigene families
Are gene families common in protein coding genes?
Very. 50-75% of protein coding genes belong to a family
What is included in a gene family?
Protein coding genes, pseudogenes, and gene fragments with sequence similarity
What are the 3 types of gene families?
Classical, domain-based, motif based
What is a classical gene family?
A gene family whose members show a high degree of sequence homology over most of the gene length, especially in the coding region
Are genes that are part of the same family clustered in the same area or dispersed through the genome?
Can be either
What type of gene family are the globin and Pax gene families?
Classical
What is a domain based gene family?
A gene family whose members have a high degree of sequence similarity within a protein domain
How similar will the gene sequences between members of a domain-based family be?
Highly similar within the domain, but much lower everywhere else
What type of gene family are the Hox genes?
Domain-based
What is a motif-based gene family?
A gene family whose members have a high degree of sequence similarity within a specific protein motif of conserved AA with a specific function
How similar will the gene sequences between members of a motif-based family be?
High homology over the motif, but not much else
What type of gene family do RNA helicases with a DEAD box belong to?
Motif-based
What is a pseudogene?
Defective copies of genes with much of the sequence intact, but with a pile of accumulated mutations that result in it having no function
What are gene fragments?
A smaller duplicated region of a gene, like a single exon
What are the two types of pseudogenes?
Processed and unprocessed/classical
How do processed pseudogenes happen?
Reverse transcription of a processed mRNA that then got inserted into the genome
What do processed pseudogenes look like?
Only exons, no introns or regulatory sequences. Might find a polyA tail
Are processed pseudogenes expressed?
Not usually
Where will processed pseudogenes typically integrate?
Right in the middle of the genome far away from any regulatory sequences. Most of the time won’t be expressed and will accumulate mutations into retropseudogenes
Where could processed pseudogenes rarely integrate?
Next to some promoters and regulatory elements with the right spacing. These might be expressed, and they would be retrogenes
If a pseudogene gets expressed, will it encode a protein?
Sometimes, but can also just be expressed as a transcript
How do unprocessed pseudogenes happen?
Gene duplication events
What do unprocessed pseudogenes look like?
Exons and introns and some flanking sequences
Are unprocessed pseudogenes expressed?
Sometimes. But they’ve accumulated so many mutations that they usually aren’t functional
What are two classifications of transposable elements?
TEs that move via an RNA intermediate (copy and paste) or via a DNA intermediate (cut and paste)
How can transposable elements that move via an RNA intermediate be classified further?
Whether they can encode their own reverse transcriptase or not
What type of transposable elements that move through an RNA intermediate can’t encode their own reverse transcriptase?
Non-viral family TEs like Alu sequences
How can transposable elements that move via an RNA intermediate AND encode their own reverse transcriptase be classified further?
Whether they have long terminal repeats (LTR) or not
What type of transposable element moves through an RNA intermediate, encodes its own reverse transcriptase, and has long terminal repeats?
Retroviral-like transposons
What type of transposable element moves through an RNA intermediate, encodes its own reverse transcriptase, and doesn’t have long terminal repeats?
LINE elements
What are LINE elements?
Long interspaced nuclear elements
Are LINE elements autonomous?
Yes, they encode their own reverse transcriptase
What do the two ORFs in a full size LINE element encode?
- p40 - protein with unknown function
2. Reverse transcriptase/endonuclease
How large is a full size LINE element? What is part of it?
6.1 kb. Includes the two ORFs, an internal promoter, a polyA tail, and 7-20 bp direct repeats on each end
Where are LINE elements usually found in the human genome?
G-bands - A/T rich regions with few genes
What are Alu elements?
A type of SINE (short interspersed nuclear repeat) that arose from 7SL RNA
What do Alu elements look like?
100-400 bp, internal promoter but no ORFs, flanked by 6-18 bp direct repeats
Are LINEs common in humans?
Very, about 850 000 of them in humans. 21% of our genome
Are Alu elements common in humans?
Very, ~1.5 million of them. Make up about 13% of our genome
What do LTR retrotransposons strongly resemble?
Retroviruses
What are the 4 basic structural elements of a retrovirus genome?
- gag - encodes RNA binding proteins/virus core
- pol - encodes reverse transcriptase and processing enzymes
- env - outer coat proteins
- flanking long terminal repeat sequences
What genomic element of retroviruses are LTR retrotransposons missing?
env - doesn’t encode any viral coat proteins
How much of our genome is made up by LTR retrotransposons?
8%
What are DNA transposons?
Transposable elements that move through cut and paste DNA intermediates
What do DNA transposons look like?
ORF encoding transposase, flanked by short inverted repeats
How much of our genome is made up of DNA transpososns?
3%, there’s about 300 000 of them
Are Alu elements evenly distributed across the genome?
No, are more likely to be found in R-bands with a higher G/C content
Are LTR retrotransposons evenly distributed across the genome?
Pretty consistent across G/C content
Are DNA transposons evenly distributed across the genome?
Pretty consistent