Genetic Variation I Flashcards
What percent of protein-coding genes are polymorphic?
33%
Additional nucleotide diversity in introns, regulatory sequences, and flanking sequences
What percentage of total genetic variation is found within populations?
~85%
What causes genetic variation in humans?
Changes to base sequence in 2 categories:
Do not affect DNA content (Number of nucleotides unchanged and insteadbases are replaced or translocated/inverted)
Causes a net gain/loss of DNA sequence (changes in copy number of DNA sequence or abnormal chromosome segregation; deletions or insertions of single nucleotides or short sequences of Mb DNA)
Do all DNA changes affect phenotype?
Most DNA changes are on small scale so they may or may not effect phenotype
What are DNA variants caused by?
Mutations resulting in alternative forms of DNA
What is a polymorphism?
For any locus, if more than one DNA variant is common in the population (Pr>0.01) it is called a polymorphism.
If Pr<0.01 it is a rare variant
Where does knowledge of DNA variants come from?
From analysing DNA from complete genome sequencing of multiple individuals
Where is most genetic variation located?
In non-coding regions of the human genome.
What are single nucleotide polymorphisms (SNPs) and variants (SNVs)?
Most common variation due to single nucleotide substitution:
Type of variant produces single nucleotide variants and if 2 or more alternative DNA variants exceed frequency of 0.01 in population it is called a single nucleotide polymorphism (SNP)
What is the “major allele”?
The allele that is more common in a population. Different populations can have different alleles as the “major allele”.
Why are SNVs not considered random?
Different regions have different mutation rates
mtDNA higher than nuclear
C-T substitutions are most common
What do alternative SNPs tell us about evolutionary ancestry?
Alternative SNPs mark alternative ancestral chromosome segments common in present day population
What do SNPs do to overall function of DNA?
They can cause gain or loss of restriction enzyme sites leading to (RFLP) Restriction fragment length polymorphism
What do indels create?
Copy number variations.
Heterozygous deletion of a single nucleotide at a defined position on a chromosome has one copy of that nucleotide instead of 2.
What does modern convention describe indels as?
Deletions/insertions up to 50 nucleotides long
What is a change in copy number of a sequence described as?
Change in copy number of sequences resulting in larger deletions/insertions (>100 nucleotides)
How common are indels?
1/10th of single nucleotide substitution
What are the types of tandom repeat structures in DNA?
Satellite DNA
Minisatellite DNA
Microsatellite DNA
What are satellite DNA structures?
length = 20kb to many 100s kb; located at centromeres, heterochromatic regions
What are minisatellite DNA strucutures?
length - 100 bp to 20 kb; located primarily on telomeres and subtelomeric regions
What are microsatellite DNA structures?
Length - fewer than 100 bp located widely throughout euchromatin
How stable are repeat sequences?
Variants differ in number of repeats
What causes variation in copy number?
Replication slippage or unequal crossover
How do microsatellites differ in population genomics to SNPs?
Microsatellites have multiple alleles unlike SNPs that only have 2 alleles
How does slippage cause insertion?
Repeat loops out and so the replication creates an extra repeat in its place due to it not aligning perfectly with the template strand.
Opposite occurs for deletions (Template strand loops out)
How can unequal crossing over result in additional repeats of a sequence?
Misaligned chromatids on homologous chromosomes can be on homologous chromosomes and when recombination occurs there is an extra few base sequences added to the end of the repeat sequences resulting in expansion of repeat length.
What are microsatellite markers used for? How are they used?
They are used to track inheritance of different chromosomes in a family.
Primers are added to flank each region before PCR is used to amplify these regions.
Then length is determined by the longest sequences via sequencing capillary column.
What do peaks on a gene scan (after sample is run through a sequencing capillary column) tell us about the gene?
The size of the repeats and their frequency. This is more informative than SNPs for distinguishing between individuals or following chromsome segments through pedigrees
How important was the human genome project to understanding microsatellites?
Early years Human Genome Project largely devoted to defining and mapping microsatellites. 150000 identified.
What are the limitations to using repeats over SNPs?
Repeat sequencing is much harder to automate than SNPs.
Where does DNA variation come from?
Some arise from errors in DNA replication or recombination
Errors in chromosome segregation results in abnormal gametes with fewer or more chromosomes than normal
Various natural errors give rise to altered copy numbers of specific sequence within a DNA strand. Crossover errors
Various endogenous/exogenous sources can cause damage to DNA by altering chemical structure
What large-scale changes are important for genetic variation?
Balanced Structural Variation
Unbalances Structural Variants
What is balanced structural variation?
DNA variants have same DNA content but differ in some DNA sequences are located in different positions of the genome. Chromosomes break and fragments are incorrectly rejoined without loss or gain of DNA (inversions and translocation0
What is unbalanced structural variation?
DNA variants differ in DNA content. Rare case where person gained/lost chromosomal region, often results in disease.
Also includes commonly occurring copy number variation, variants differ in number of copies of moderately long to very long DNA sequence. Some CNVs contribute to disease and others are normal
What is the most common type of genetic variation?
SNPs (75% of DNA changes are SNPs)
How many SNPs are there in a human genome?
38 million (1 per 100)
Vast majority are rare in any population
Most people would be homozygous for any SNP loci.
Personal sequencing - SNVs between maternal/paternal = 1 per 1000 bps.
Structural variations = 1/4 of mutational events dominated by CNV
What is the human genome project good for?
For consensus, not good for individual differences. Not good for genetic variation.
What are the steps that took place historically to come up with variant maps?
Human genome project
Identifying of genetic variants (anonymous with respect to traits)
Assay genetic variants (verify polymorphisms, catalogue corrections amongst sites, anonymous with respect to traits) [hapmap project]
How were SNPs found and understood?
SNPs were first discovered (goal was to identify 300k SNPs and to determine the allele frequency of SNPs)
Then SNPS were characterised
What reference genome was used for SNPs?
Human Genome Project