Topic 4c: SNPs Flashcards
What is a SNP?
Single nucleotide polymorphism, variation at a nucleotide site in a specific region (locus) of DNA.
Most SNPs are bi-allelic, meaning that there are only 2 different nucleotides segregating at a locus in a population (low mutation rate=little homoplasy)
Do SNPs have high or low information content per locus? Are the abundant? How does this compare to microsatellites?
SNPs have very low information content per locus because they are biallelic (the more diverse is 0.5 heterozygosity), and they are abundant in the genome.
Compared to microsatellites, these provide less information because microsats are not bi-allelic, they have many more than 2 alleles, and both are abundant in the genome
Where are SNPs found?
Missense and nonsense mutations are found in coding sequence
Silent mutations in the coding sequence
If they are in upstream promoters/enhancers then they cause changes to gene expression
Variation in splice sites leads to alternative splicing
Introns: RNA secondary structure
Non-coding DNA (intergenic) they are neutral
What are the 5 ways that you can detect DNA base variation in homologous pieces of DNA?
Restriction enzyme site variation
Sanger DNA sequencing
Whole genome DNA sequencing
Reduced representation sequencing
SNP chips
How do you detect SNPs using Sanger Sequencing?
You can find “mixed bases” in the sequence readings of heterozygotes (as 2 alleles are present)
When multiple SNP sites are found in the same read, you cannot tell which alleles occur on the same chromosome (need cloning or next gen sequencing for that)
What is reduced represenation sequencing?
To detect SNPS, you must resequence the same regions of the genome multiple times to detect polymorphic sites. The regions being sequences are reduced portions of the genome and are homologous in all individuals that are sequences (to compare the same markers in all individuals). This allows for simultaneous discovery of markers and genotyping in a sample of individuals.
What are two advantages to reduced representation sequencing?
Computationally more tractable than whole genome sequencing (especially for non-model organisms with no known genomic resources)
More cost effective and allows researcher to multiplex many samples in a single run
What are the 3 types of reduced regions that may be used?
RAD (restriction site associated): random throughout the genome, mostly neutral markers
Exons: captured by hybridisation to probes
cDNA: expressed regions of genome in a certain tissue at a certain time
Why would we use RAD?
Restriction enzymes provide a way to reduce the complexity of the genome my narrowing the fragment range.
What are the steps of using RAD as a reduced region?
- Digest DNA with a restriction enzyme
- Ligate a synthetic adapter of known sequence to the sticky ends
- Mechanically shear the ligated fragments
- ligate a second adapter to the blunt
- Amplify the fragments that have both the adapters on them by PCR and resequence them
- Align sequences and look for SNPs
What is SNP genotyping using chips/arrays?
You take upwards of 2,000,000 DNA oligonucleotide probes where the 3’ end is 1bp short of the known SNP in the genome and fix them to a glass slide.
Genomic DNA fragments hybridize to the probes that are perfectly complementary, and a single base sequencing reaction is performed.
The computer then reads the genotypes as a homozygote (1/2 base/colour is incorporated) or as a heterozygote (2 different bases/colours included)
What is cross-species amplification success?
Cross-species amplification success decreases linearly with time since common ancestor, and amplification is probably lost when a mutation occurs in the flanking sequence
How good is cross-species retention of SNP polymorphism?
Cross-species polymorphisms decrease in a rapid exponential fashion because polymorphism is lost very quickly due to genetic drift, and only some remains due to mutation-drift balance
What are pros and cons of SNP genotyping arrays (chips)?
PROS: asses large number of loci quickly, same loci assessed in every individual, built in quality control
CONS: must have list of loci to begin with, cannot modify chip once its made, low transferability across species