S4: Genome Variation Flashcards
How big is the human genome and does it vary between individuals?
- The whole human genome is 3 billion base pairs in the haploid genome containing 20,000 genes.
- Only 2% of the genome codes for protein, 98% doesn’t so those 20,000 genes make up only 2% of the entire genome called exome.
- Every base is not identical between individuals as we all appear phenotypically different e.g. height, colour, diseases. Pathogenic mutations are rather rare, otherwise we’d all have bad diseases, however there is also a lot of common coding variation in the genome that isn’t associated with disease but is associated with normal phenotypic differences we see e.g. Height, hair, colour, intelligence. Some of this variation will be in the coding regions of the genome (the 2%) others will be in the non-coding regions (the 98%).
- 99.7% of DNA between any two people is identical, this gives about 3 million bases difference in the genome between individuals.
- Any position in the genome that varies between individuals is considered to be polymorphic.
- Major macro-level differences generally associated with disease (aneuploidy, translocations, etc).
- Also micro or molecular-level pathogenic difference sometimes associated with disease (point mutation and SCA, 3bp deletion in CFTR).
Using CF as an example of common variation in genes
- There are many different genetic mutations at different positions in the CFTR gene, and only a few of these mutations will cause cystic fibrosis.
- The vast majority of these mutations in the gene are harmless and common variations that can occur in anyone. They may even change an amino acid and thus the primary sequence of the protein but this is still harmless.
- So even in a gene associated with disease we have common variations.
What is an allele?
An allele is a unique position (locus) in the genome, this could be a single base or an entire gene. In a diploid genome, we have two alleles at any autosomal locus, these may be homozygous (alleles are identical) or heterozygous (alleles are different). The combination of alleles gives us our genotype.
What does bialleic, trialleic , multialliec mean ?
- At a particular locus in genome we only ever see two possible alleles in the population = biallelic.
- If three = triallelic (e.g. ABO blood groups, the gene will produce either A-antigen, B-antigen or O-antigen, so there are three possible variants).
- If more than three = multiallelic.
- If biallelic, the frequency of the minor allele is relatively high.
What is a genetic variant?
- There is common and uncommon.
- A variant is common if we see lots of that type of variation in the genome (e.g. CNV, STR).
- A trisomy is not a common variation as we would only see it once or twice in the genome.
How does variant give us population frequency (Pop n)?
- The frequency of the different allele/variant is relatively high in the population. In other words the less common allele still has a high population frequency so occurs quite a bit in the population. This is the proportion of chromosomes that carry each allele in the population e.g. what proportion of chromosomes in a lecture carry the variant and what proportion do not. This gives us population frequency.
- For a population frequency of an allele this will be expressed as a % or decimal e.g. 50% carry this alleleic variant.
- If we looked at the allele frequency of two different populations of the same species the allele frequency may be different.
What is a polymorphism and mutation?
- A polymorphism is if the minor allele frequency (frequency of rarest allele) is greater than 1% in the population.
- A rare polymorphism is when the minor allele frequency is between 1 -5%.
- A common polymorphism is when the minor allele frequency is greater than 5%.
- Any allelic variant that appears less than 1% in the population is considered as a mutation because with such a low appearance it is likely to be damaging as selective pressure keeps its frequency down.
Why do all variants start off as rare?
- All variants of an allele start off rare, at one point a person has an allele but then there is a change and they have a new variant.
- Evolutionary forces (selection) will determine whether the variant remains rare or becomes more common.
- Thus a rare variant may be damaging and or recent.
List types of rare genetic variation
- Translocations, Aneuploidy, Deletions and Duplications.
- Most people do not have them and they generally have severe clinically consequences.
List types of common genetic variation
- Single nucleotide polymorphism (SNP)/Single nucleotide variant (SNV).
- Microsatellite/Short tandem repeat (STR).
- Minisatellite/Variable number of tandem repeats (VNTR).
- Copy number variation (CNV).
- We all have lots of these. They may cause disease, affect traits or alter susceptibility to disease.
How do we know what is normal and what is a variant when there is a different allele?
- This came from human genome mapping project which is entirely based on genome of 4 anonymous individuals.
- The consensus (reference sequence) is based on the majority allele are on those positions.
- Since then, thousands of people have had their genome sequence to constantly update the reference DNA.
- The reference allele will therefore be the most common in the population and the minor allele is the minority in the population. The minor allele frequency can be calculated to see if the position is polymorphic.
What is a Single Nucleotide Polymorphisms (SNPs)/Single Nucleotide Variant (SNV)?
- These appear lots in the genome, with one position every 300 nucleotides differing by a substitution of a base.
- There are approximately 17 million SNPs identified in the human genome, these are natural common variations that have been generated due to problems with replication of DNA during the mismatch repair during mitosis.
- Majority not in the exome.
Describe how SNP arise
- During DNA replication DNA helicase separates the two complementary strands and the DNA polymerase moves along each strand synthesising a new complementary strand on the template. DNA polymerase also has a proof-reading ability, so if the wrong nucleotide is inserted then it is immediately removed and replaced with the correct nucleotide.
- Sometimes this doesn’t work, so in this case we have the mismatch repair system that recognises the mismatch between non-complementary bases and then takes out one and puts a complementary one. Sometimes this doesn’t occur correctly and this generates SNPs.
- The mismatch repair system that will cut out the correct base in the sequence and put in the complementary one (to the wrong base).
- So the pair are now complementary but now in the daughter cells there is a difference in DNA at that locus, this has created a SNP. If this happened in a gamete it would be passed on to the next generation.
- They are usually bialleic as there are two possible allele/genes in any population.
What are the consequences if SNP occur in a gene or other?
- Point mutations include an amino acid that is changed (missense/non-synonymous), a stop codon introduced (nosense) or splice site affected.
- No amino acid change as codon system is a degenerative code (i.e. more than one codon for a single AA) this is synonymous.
- Affect promoter and then protein expression.
- Non coding region.
Do SNP disappear?
Without a deleterious effect or population annihilation, SNPs do not disappear.