Genome Variation Flashcards
What proportion of the genome is the exome - codes for protein?
How much of our genome do we share with someone else?
2%
99.7% - only 9 million bases out of 3 billion are different
What kind of differences in the genome are associated with disease?
Macro-level differences (e.g. trisomy 21) and micro-level, molecular-level differences (e.g. a single point mutation as in SCA/ a 3BP deletion in the CFTR gene leading to CF)
What is special about mono-zygotic twins’ DNA?
They are identical at every base
What in DNA terms is considered polymorphic?
Any base in the genome that varies between individuals is polymorphic
What is a reference sequence and a reference allele?
A sequence database which summarizes the base at that position that is present for the majority of people
The most common allele
How are variants/ polymorphic positions found?
By comparing someone’s sequence to a reference sequence and seeing that they are different
How was the referencing sequence generated?
4 anonymous individuals genomes were sequenced and averaged out in the human genome project
How often does a SNV occur in the reference sequence and in one individual?
Once every 300 nucleotides in the reference sequence; once every 1000 nucleotides in an individual
Where are the majority of SNVs found?
Not in the exome
How are SNVs generated?
By faulty mismatch repairing that occurs during DNA replication
What is a biallelic site?
A site in DNA where there could be 2 possible alleles (2 variants, one of which is the reference sequence base)
“A biallelic site is a specific locus in a genome that contains two observed alleles, counting the reference as one, and therefore allowing for one variant allele. In practical terms, this is what you would call a site where, across multiple samples in a cohort, you have evidence for a single non-reference allele.”
How is a SNV formed?
In DNA replication the two strands separate and are templates to synthesise complementary strands, forming identical copies.
However, when synthesising this strand instead of incorporating an A, a G has been incorporated.
The mismatch repair mechanism will identify this mistake and correct it so that the bases are a standard Watson-Crick base pair
However, in this instance it hasn’t corrected the G, it’s replaced the T with a C.
If this change occurs in the gametes and isn’t deleterious then it will get passed on to the next generation and as time goes on it can spread through the population
Where can SNVs be found?
In genes, promoters and non-coding regions
In genes, they can change (non-synonymous/ missense) or not change (synonymous) an amino acid, and could change the amino acid into a stop codon (nonsense). They also change where the splicing can occur in a sequence (the splice sites) and can occur in a UTR, affecting gene expression
In promoters they can affect protein expression
When do SNVs disappear from the genome?
When they have a deleterious effect (causing harm/ damage) or cause population annihilation
What kind of mutation is SCA?
A point, missense mutation
How common is the SCA point mutation?
White European people
0.02%
2 in every 10,000 chromosomes
African people
4.5%
1 in every 20 chromosomes
What is heterozygote advantage?
When being heterozygous for a disease allele provides benefit e.g. having a sickle cell trait protects against malaria in heterozygotes
Minor allele frequency figures:
- For a mutation
- For a polymorphism (but this does not imply no pathogenicity e.g. SCA)
- Rare polymorphism
- Common polymorphism
- <1%
- > 1%
- 1-5%
- > 5%
What determines if a variant (that starts off rare) remains that way?
Evolutionary forces
How do SNVs spread?
- Migration introduces the variant into another population, known as gene flow
- Random change in variant allele frequency between generations, known as genetic drift
- Selection in favour of the allele (non random change in variant allele frequency)
What is negative and positive selection?
Negative selection is the selective removal of rare alleles that are harmful, whilst positive selection are the traits of a species that are selected for
What must be considered when determining of a variant is pathogenic?
If it is occurring in a gene/ not and if so, is it a key developmental gene (e.g. HOXD1, in which case it could be pathogenic) or not (e.g. MC1R gene for pigmentation)
So it is not easy to determine this. It depends on both the type of variant and the environment.
Why is every genome not exactly 3000Mbs?
Every individual has a different (and highly variable) number of microsatellites/ STRs/ SSRs (simple sequence repeats)
What are the types of microsatellites?
Di/ tri/ tetra/ penta/ hexa nucleotides (tells how many bases are present in 1 repeat)
What does the reference sequence show in terms of microsatellites?
The number of repeats the vast majority of the population have
How are microsatellites unlike SNPs?
They are multiallelic (can have 7,9,12 repeats etc.) not biallelic (only A and C versions of the allele)
Why are microsatellites so prone to differences in the number of their repeats?
During replication, the growing strand can become unattached and then re-anneal in the wrong place i.e. shifts forward (since the whole of that area is complimentary). This leaves a little bubble of unpaired bases, which the polymerase complex backtracks and repeats the insertion of deoxyribonucleotides at, leading to extra repeated units being incorporated into the sequence (and so multiallelism)
Where are microsatellites found?
Anywhere in the genome!
Part of introns, UTRs; affects gene expression
Part of intergenic regions
Part of exons (could introduce a new amino acid in a protein e.g. in Huntington’s disease, an expansion disorder)
What is a CNV?
Copy Number Variants are big chunks of DNA (e.g. 5MB’s long) that have either been duplicated to form more than 2 copies (so we are no longer diploid for that locus) or have had 1 copy/ both copies deleted. This does not cause harm
What is the simplest type of CNV?
The presence/ absence of a gene; so an individual can contain from 0-4 copies of a generic
What causes CNV?
Non allelic homologous recombination occurring in Meiosis - 2 pairs of homologous chromosomes misalign before exchanging genetic material, due to sequence similarity between the 2 chromosomes, causing duplication in one pair and deletion in the other and so a copy number change
Can CNVs be found intergenically?
Yes but due to their large size they often affect one or more genes
How much of the genome is a CNV?
Approx. 12%
What is an example of a CNV being pathogenic?
Trisomy 21 (Down syndrome) Microdeletion disorders (DiGeorge syndrome)
What are the types of common genetic variation? Do they cause mendelian, monogenic disorders?
SNPs, microsatellites and CNVs - we all have them, just have a different genotype. Most common variants do not cause mendelian, monogenic disorders, the majority are actually neutral. Instead they impact on complex, non-mendelian disorders + contribute to general individual variation (e.g. looks, sporting ability)
What is the minor allele frequency in biallelic variants?
Relatively high
What are some common variants and their associations?
Height Allergies Haemochromatosis Type 1 and Type 2 diabetes Alzheimer’s Anxiety Dyslexia Memory Sexual desire Aging Nicotine dependence Faithfulness Age-related hearing loss Gout Sciatica Sense of smell HIV susceptibility Anti-social behaviour
What are some variant effects?
Beneficial/ pathogenic (but most are neutral)
What can variants be used for?
As marks to help find disease causing genes/ mutations
Autozygosity mapping (dentification of recessively inherited disease genes using small inbred families)
Linkage studies
(Microsatellites, SNPs)
Association analysis (SNPs, CNVs)
What is the book analogy?
Whole book = genome
Chapter = chromosome
Delete or duplicate chapter and you can really mess the story up
Paragraph = CNV
Delete or duplicate a paragraph and, as long as it’s not key, you can make do
Sentence = microsatellite
Accidentally repeat several words within that sentence, you elongate it, it’s annoying but not fatal to the plot
Letter = SNP
Typos often barely change the meaning at all