Genome Variation Flashcards
The human genome and what are macro/micro variation?
- 23 pairs of chromosomes.
- 3 billion base pairs (20,000 genes).
- 2% of genomes code for protein (exomes).
- Major macro-level differences/variation generally associated with disease (aneuploidy, translocations). - rare.
- Micro or molecular-level pathogenic difference sometimes associated with disease (point mutation and SCA, 3bp deletion in CFTR). - also rare.
Coding variants effect traits (height, hair colour, intelligence, etc.
99.7% DNA is the same between any 2 people (i.e. yet ~9 million bases are different).
What is a variant?
Any position in the genome that varies between individuals is considered a (polymorphic) variant.
Polymorphism = a discontinuous genetic variation resulting in the occurrence of several different forms or types of individuals among the members of a single species.
Discontinuous genetic variation divides the individuals of a population into two or more sharply distinct forms.
Monomorphic = not variant.
What are the 3 common genetic variants?
There are not generally harmful.
- Single Nucleotide Polymorphisms (SNPs) ~ 17 million identified; ~ 3 million/genome.
- Microsatellites ~ 3% of genome
- Copy Number Variants (CNVs) > 2000 identified; ~ 100 per genome.
Is every base identical between individuals?
No, 2 people differ in DNA sequence at about 9 million base pairs.
What is a Single Nucleotide Variant (SNV)?
It is a change in a single base (base substitution).
The genome is littered with them. Comparing human genomes reveals:
- There is a high frequency: 1 every 300 nucleotides in reference genome. genomes.
- In one individual: 1 occurs every 1000 bases.
- Millions SNVs identified in human genomes (12 million SNVs identified in total).
- Majority not in exome
- Generated by mismatch repair during DNA replication. –> typically generated by faulty DNA replication in mitosis. Although there are mismatch repair mechanisms which should correct these mistakes, some don’t get corrected and we end up with an SNV.
What does Bi-allelic mean?
When there is a possibility for 2 alleles at one site
Describe how SNVs/SNPs come about.
During DNA replication, the two strands will separate and will be used as templates to synthesise complementary strands.
If that goes well, then we should end up with two identical copies.
However, when synthesising this strand, instead of incorporating an A, a G has been incorporated (THE MISTAKE). The mismatch repair system will identify this mistake and correct it so that the bases are a standard Watson-Crick base pair.
However, in this instance, it hasn’t corrected the G, it has instead replaced the T with a C. And thus, what we end up with is at this position there’s either a T or a C.
If these changes occur in the gametes and aren’t deleterious, then it will get passed on to the next generation, and as time goes on, it can spread throughout the population.
Where can SNVs end up in the genome, and what effect can they have?
SNVs can end up anywhere, such as:
THE GENE:
- no amino acid change (synonymous variant)
- amino acid change (non-synonymous/missense)
- stop codon (nonsense)
- splice split (splice variant)
- UTR (gene expression)
THE PROMOTER:
- protein expansion
THE NON-CODING REGION:
- n/a (unknown)
Without a deleterious effect a population or population annihilation, SNVs do not disappear. They can potentially spread by random chance throughout the population.
What is the difference between mutations and polymorphisms?
If the minor allele frequency is less than 1%, it’s a mutation.
If the minor allele frequency is more than 1%, it’s a polymorphism.
- rare polymorphisms: MAF 1-5%
- common polymorphism: MAF >5%
Thus, it is safer to use the term variant [all variant start off rare].
Evolutionary forces affect whether or not the variant remains rare - if it is damaging or recent.
How do evolutionary forces affect SNVs?
MUTATION: a new allele arises, we now have a variant
GENE FLOW: migration leading to the introduction of that variant into another population
GENETIC DRIFT: random change in variant allele frequency between generations
SELECTION: non-random change in variant allele frequency between generations because the presence of one allele/genotype is pathogenic (negative selection) or beneficial (positive selection)
What are biological impacts of variants?
Consider…..
- Where are they?
In a gene?
Not in a gene?
- What sort of gene?
Key developmental gene, e.g. HOXD1
Pigmentation, e.g. MC1R - Not straightforward
Depends on the type of variant (lots of variants in every gene –some pathogenic, some not; depends on the environment)
Summary of SNPs
- Millions in genome
- A position in genome at which the base can vary
- Can be anywhere in the genome (genic or non-genic)
- May do nothing, may affect a trait, may be associated with disorder
- Generally bi-allelic
- Due to mutation and mismatch repair
- These are base substitutions
- When pathogenic, may call point mutations.
What are microsatellites (a.k.a short tandem repeats)?
These are a set of short, repeated DNA sequences in tandem (ie. after one another) at a particular locus on a chromosome. They vary in number in different individuals, and so can be used for genetic fingerprinting.
Microsatellites may be in the part of the 98% of the genome not coding for protein (intronic or UTR: may affect gene expression, or intergenic), or it may be in exons (extra amino acids can be added in protein).
The sequence in unit (e.g. GATA) does not vary. Number of times unit appears can vary.
There are dincuelotides, trinucleotides, tetranucleotides etc etc. which can repeat a varied number of times.
Microsatellites -Increased heterozygousity, highly polymorphic, highly multiallelic. Whereas SNVs, most people are homozygous.
Microsatellites are generally not harmful, an expansion disorders is for e.g. Huntington’s = trinucleotide repeat expansion disorder, basically a “bad” microsatellite.
Describe the Polymerase Slippage Model and what an error can lead to?
During replication, polymerase slippage and subsequent reattachment may cause a bubble to form in the new strand. Slippage is thought to occur in sections of DNA with repeated patterns of bases (such CAG) - microsatellites.
Then, DNA repair mechanisms realign the template strand with the new strand and the bubble is straightened out. The resulting double helix is thus expanded (microsatellite expanded).
Polymerase slippage (as theorised) cannot occur in DNA without repeating patterns of bases (microsatellites).
Summary of Microsatellites
- 1000s in genome
- Repeat units
- Varying numbers of repeats
- Alters actual size of that region of the genome
- Multiallelic
- Can be anywhere in genome
- May do nothing….