Gene variation Flashcards
2% of genome
Exome
A variant
• Any position in the genome that varies between individuals is considered polymorphic aka a variant
Between 2 people
99.7% DNA same ie 9 million bases different
Example of a variant
Position 17 = T/A = polymorphism
The reference allele = T
Most common allele = T
Minor allele = A
Single nucleotide variant
- High frequency: 1 every 300 nucleotides in reference genome
- One individual: 1 every 1000 bases
- Millions SNVs identified in human genomes
- Majority not in exome –
- Generated by mismatch repair during DNA replication
Parental strand has changed …
- When synthesising this strand, instead of incorporating an A, a G has been incorporated.
- The mismatch repair mechanism will identify this mistake and correct it so that the bases are a standard Watson-Crick base pair
- However, in this instance it hasn’t corrected the G, it’s replaced the T with a C.
- And what we end up with is at this position there’s either a T or a C
- If this change occurs in the gametes and isn’t deleterious then it will get passed on to the next generation
- As time goes on it can spread through the population.
SNVs may be in a
• Gene No amino acid change (synonymous) Amino acid change (non-synonymous/missense) Stop codon (nonsense) Splice site UTR (affect gene expression) • Promoter Protein expression • Non-coding region • Without a deleterious effect or population annihilation, SNVs do not disappear
Sickle cell anaemia
Estimated frequency of SCA variant allele:
European = 0.02%, i.e. 2 in every 10,000 chromosomes
African = 4.5%, i.e. ~1 in every 20 chromosomes
Why? – beneficial in places where malaria is rife (heterozygote advantage)
Mutation or polymorphism
• If minor allele freqy >1% (i.e. at least 1 in every 100 chromosomes has non-reference allele) = polymorphism
Rare polymorphism: MAF 1-5%
Common polymorphism: MAF >5%
Safer to use term variant
Less than 1% is a mutation
Safer to use term variant
• All variants start off rare
• Evoly forces affect whether a variant remains rare
• Rare variant may be damaging and/or recent event
Evolutionary forces and the SNV
• Mutation
New allele arises, we now have a Variant
• Gene flow
Migration leading to introduction of that variant into another population
• Genetic drift
Random change in variant allele frequency between generations
• Selection
Microsatellites
- Also known as a short tandem repeat
* The AC = repeat, it is repeat in tandem (i.e. one after another)
Slippage event
during replication, polymerase slippage and subsequent reattachment may cause a bubble to form in the new strand. slippage is through to occur in section of DNA with repeated patterns of bases (such as CAG).
Then DNA repair mechanisms realign the template strand with the new strand and the bubble is straightened out. The resulting double helix is thus expanded
b) polymerase slippage, as theorized, cannot occur in DNA without repeating patterns of bases
Microsatellites may be in a
• Part of the 98% of genome not coding for protein
Intronic or UTR: may affect gene expression
Intergenic
• Exonic
Extra amino acids in protein
Examples of microsatellites
• Expansion disorders, e.g. Huntington’s = trinucleotide repeat expansion disorder, basically a “bad” microsatellite
Summary of microsatellites
• 1000s in genome • Repeat units • Varying numbers of repeats • Alters actual size of that region of the genome • Multiallelic • Can be anywhere in genome May do nothing
Copy number variant
• Can be variation in number of copies between people
The simplest type of copy number variation is the presence or absence of a gene.
An individual’s genome could therefore contain two, one, or zero copies.
Duplication of a genomic segment
Could result in diploid copy numbers of two there or four
Copy number variation
One copy can have an extra C in comparison to the original template
Another copy can have 3 C’s
Non-allelic homologous recombination in meiosis
A-D = loci on chromosome
Grey and blue = homologous chromosomes aligning in meiosis I
Red bands = regions of high sequence similarity, often viral/bacterial genomes that have been incorporated through evolution
Allelic recombination is good! – shuffling of alleles
But non-allelic recombination results in duplication/deletion and copy number change
CNVs may be
Intergenic
But- quite large (>1kB) so often affect one or more more genes (parts of genes)
~ 12% genome = CNV
>2000 identified
Types of common genetic variation
- Single Nucleotide Polymorphisms (SNPs) ~17 million identified; ~3 million/genome
- Microsatellites ~3% of the genome
- Copy Number Variants (CNVs) >2000 identified; ~100 per genome
- Remember – everyone “has” every variant, what may differ between individuals is the genotype
Common facts
• We see lots of these types of variants throughout the genome
• If biallelic, the frequency of the minor allele is relatively high
Population frequency
i.e. proportion of chromosomes that carry each allele in the population
Or multiallelic
• Compare with translocations or aneuploidies – we don’t see thousands or millions of those because they’re generally damaging.
Variant effect
• Can be beneficial
• Can be pathogenic
• Most are neutral
• Are these of any use?
• Yes, can be used as markers to help find disease-causing genes and mutations
Autozygosity mapping & linkage studies (Microsatellites, SNPs)
Association analysis (SNPs, CNVs)
Biallelic
2 possible alleles
Triallelic
3 possible alleles
Multiallelic
> 3 alleles
Allele
The particular form of a speific locus
Locus
Unique position in genome
Genotype
An individual has 2 alleles for any autosomal locus
Autozygosity mapping & linkage studies
Microsatellites, SNPs
Association analysis
SNPs, CNVs
Common variants and disease/ trait associations
Most common variants not causing Mendelian, monogenic disorders.
Majority are probably neutral (particularly intergenic variants).