Genomics And Human Variation Flashcards
Describe nuclear genes
High gene density in euchromatin and centromeric regions are generally non-coding
Genes are variable in size
-By far the largest human gene is the dystrophin gene ~ 2.5 Mb -2.7Mb depending on how it is defined
- Other genes may be very (very) small
What are single copy genes?
- Unique sequences in the genome that code for a protein:
- Receptors, enzymes, hormones, structural cellular elements etc.
What are multiple gene families?
(Can be clustered together or dispersed in different genomic locations): genes with similar functions that have arisen by gene duplication
What are the types of gene families?
Classic gene families
-multi copy genes that show a high degree of homology (e.g. HOX genes, globin genes, and genes for rRNAs, tRNAs)
Gene superfamilies
-multi copy genes with similar function but limited gene homology (e.g. HLA genes, T-cell receptors)
What is extragenic DNA?
- Extragenic DNA constitutes the majority of the human genome
- Extragenic DNA is predominantly transcriptionally active
- Extragenic DNA might play a role in regulation of gene expression
- Tandem repeated DNA sequences consists of blocks of tandem repeats of non-coding DNA. (Length of repeat sequence determines how it is named)
- The shorter the repeat length, the more polymorphic that sequence is
- Tandem repeats are inherited in a co-dominant fashion; one from each parent
- Simple sequence repeat variations are used as the basis of DNA fingerprinting
What is a Single nucleotide polymorphism?
- formally defined as a variant that is found at least in 1% of the population
- Single base pair change between individuals
What is a simple sequence repeat (SSR)?
These is the most simple type of repetitive sequence; and most polymorphic.
- These are tandem repeats of 2,3 or 4 bp, repeated many times
- Some texts will call these simple tandem repeats (STR), but this may lead to confusion with VNTR si we with use that terminology
What is a VNTR?
(Sometimes called short tandem repeats)
-These repeats are a bit longer then the SSR, maybe 5 bp, 10 bp, or hundred of bp; repeating many times
Describe single nucleotide polymorphism
Human genome is about 3x10^9 bp for haploid
- Approximately 1 in each thousand bp there is a common variation (common means that below 1% of the population has the change; polymorphism)
- Therefore there are about 3 million common single nucleotide bp changes- SNP
- Most SNP do not have associated phenotype
Explain the application of SSR analysis of a CA repeat linked to an autosomal-dominant disease
This is tracking an SSR, since it is a variation in tandem repeats of a dinucleotide sequence
-Dad’s genotype is most likely B,D. the 6 tandem CA repeat ‘allele D’ is most likely linked to the disease in his family
CA repeat ‘allele D’ linked to the disease
- SSR are vary useful in forensics and paternity testing
- Easy to detect by PCR. Based methods
Explain what is a low copy repeat
These may be thousands (kB) to many hundreds of thousands of bp long (that is, they can be big)
- May be repeated just twice in the genome, or many times
- May cause mispairing during meiosis or mitosis
What are the long interspersed nuclear elements(LINES)?
( LINES ~600 base pairs)
- found in large numbers in eukaryotic genomes)
- They are able to make an RNA
- LINES include a gene that encodes the enzyme reverse transcriptase
- Reverse transcriptase makes a DNA copy of the LINE or SINE mRNA that can be integrated into the genome at a new site
- LINEs are therefore capable of copying themselves and may enlarge the genome. The human genome contains 100,000s of LINES
- LINE and SINE sequence repeats may contribute to mutation by leading to unequal crossover during meiosis
- (same idea as for low copy repeat sequences)
explain what are short interspersed nuclear elements (SINES)
- Short sequences of under 500 bp (maybe 300 bp) that are found up to 1.5 million times in the genome - approximately ~10% of the human genome
- Appear to be “normal” RNAs that were converted to DNA reverse transcriptase and were reinserted into the genome. Reverse transcriptase is hijacked from a LINE.
- Most common SINE in humans is the Alu sequence. Called Alu because the SINEs contain a sequence that is recognized by the Alu restriction enzyme
What are the rare variants of SNPs?
Rare variants are “rare” in that they aren’t frequently found in the population, but there may be billions of them
-Another way of saying this : there are many more rare variants than there are common variants- they just aren’t frequently found
-We don’t know all of these rare variants, and there will always be a new one to discover
Rare variants are thought to contribute to human disease more than SNP
What are pseudogenes?
Sequences that look like real genes but aren’t functional (no protein product). Most probably arose during evolution by:
- Gene duplication and subsequent mutation
- Copying of RNA back to DNA (by viral reverse transcriptases) and reinsertion into genome
- The gene is turned off for some reason