week 5- molecular markers and allele dynamics Flashcards
what are molcular markers
They are specific sequences of DNA that can be used to identify individuals, populations or species
what do molecular markers represent
variations in the genetic code that can be tracked and analysed
what do molecular markers allow us to observe
alleles; information that is used to understand gentic diversity, inheritance patterns and evolutionary relationships
what can seperate molecular markers provide
independent tests of hypotheses, thus using many together can provide more sensitivity
what does direct DNA sequencing provide
direct observations of the DNA sequence and thus the alleles
what technologies provide indirect observations of alleles
allozymes, RFLP and microsatellites
molecular markers in 1960s
genomes variations assayed via proteins
1. protein immunology
2. Protein electrophoresis
molecular markers 1970s-1990s
enter DNA
1. DNA-DNA hybridisation
2. restriction analyses including RFLP
3. minisatellites (DNA fingerprinting)
molecular markers 1985 onwards
the polymerase chain reaction (PCR)
can now amplify, in vitro, assayable quantities of almost any desired piece of DNA from almost any biological source
molecular approaches that depend on PCR
- random amplified polymporphic DNAs (RAPDs)
- amplified fragment length polymorphism (ALFPs)
- micro satellites (aka STRs, SSRs, SSLPs)
- direct DNA sequencing
- single nucleotide polymorphisms (SNPs)
what did PCR enable
- analysis of ancient and other forensic quality samples
- non-invasive sampling
random amplified polymorphic DNA (RAPD)
- a short PCR primer (8-10mer) of arbitary sequence is used to randomly amplify anonymous regions of the genome
amplified fragment length polymorphism (AFLP)
-it combines RFLP and PCR to produce more replicable fingerprints that RAPD
pros and cons or RAPD and AFLP
pros: quick, inexpensive, represent the entire genome
cons: dominant, lack of reproductivity
what are microsatellites
- a very important marker class developed in 1990s
-Assays 1- 6 nt tandem repeats distributed throughout the genome:
-often non-coding regions eg. telomeres, centromeres, promoters
-co-dominant mendelian markers, mutilocus genotypes
-variability in microsatellite repeat number arises by slipped-strand mispairing during replication
pros and cons or microsatellites
pros:
-profiles obtainable from trace amounts of degraded DNA
-genome-wide coverage, high variability
-can score many loci on many samples pretty quickly
-neutral
cons:
- isolating loci laborious and expensive, loci often species specific
-evolve too quickly to be useful above the population level
-mostly inappropriate for intraspecific phylogeny
applications of micro satellites
-individual and population level analysis:
-population structure and demography
-mating
-parentage and relatedness
-forensics
-mapping
used to decide on breeding pairs in the captive breeding program of cuban amazon parrot
what is direct DNA sequencing
-1990s DNA sequencing emerged as a powerful and versatile source of genetic variation
- widely used in evolutionary genetics only following the advent of PCR
-sanger enzymatic sequencing developed in 1977, data collection now automated
three steps of sanger sequencing
- PCR with flourescent chain-terminating ddNTPs
- size separtion by capillary gel electrophoresis
- Laser excitation and detection by the sequencing machine
direct DNA sequencing pros and cons
pros:
-can address questions at any taxonomic level by choosing the right gene or gene region: protein-coding, intron, mtDNA, RNA
-can choose to analyse all variable sites, or a subset such as synonymous sites, or even predicted amino acid sequence
cons:
-only looking at one locus, can be costly and/or time consuming
Molecular genetic approach to monitoring whaling
-tested potetnial of molecular genetic methods for identifying specis and probable geographic source of whale products
-used 16 samples purchased in retail markets in japan all labeled as whale
-used a portable laboratory in hotel room to avoid issues with exporting
-PCR amplified, purified and later sequenced 155 to 378 base pairs of the mitochondrial DNA (mtDNA) control region
-early example of molecular identification of species from unknown tissues
molecular markers, modern era
single nucleotide polmorphisms (SNPs)
- SNPs distributed across the genome represent the most widespread and potentially valuable source of genetic variation, but finding and screening have, until recently, been prohibitively costly and time-consuming
exaplain sanger sequencing maxam and gilbert sanger chain-termination
-infer nucleotide using dNTPs then visualise with electrophoresis
-500-1000 bp fragments
-short read sequencing (hard to assemble)
explain 454, solexa, ion torrent illumina
-high throughput from the parallellisation of sequencing reactions
- -50-500 bp fragments
-short-read sequencing (hard to assemble)
exaplin pacbio, Oxford nanopore
-sequence native DNA in real time with single-molecule resolution
-tens of kb fragments, on average
-single-molecule sequencing
what is single-molecule sequencing characterised by
-the lack of DNA or RNA amplification in template library preparation
-require less input genomic DNA
-avoid polymerase chain reaction-introduced error and amplification bias
-real time measurements
-longer reads
problem: high error reads
important considerations in choosing genetic markers- sensitivty
a marker must have the correct sensitivity for the question at hand
- it is possible to have too much information (analogy: trying to navigate from newcastle to rome using 1:25000 scale topographic maps)
-or too little information (trying to navigate to a restaurant in Hexam using a map of the UK)
important considerations in choosing genetic markers- coding versus non-coding
knowledge about DNA regions used as genetic markers can help predict their likely sensitivity
-a gene coding for a structural function will usually be more conserved by evolution than a DNA region that is non-coding
-within protein-coding genes, there is a strong pattern that nucleotides within codons are constrained third<first<second positions
important considerations in choosing genetic markers- organelle (mitochondrial, chloroplast) as well as nuclear DNA
cells from most eukaryotes contain biparentally-inherited nuclear DNA, as well as DNA in organelles that is usually inherited uniparentally
-mtDNA and nuclear DNA gene genealogies reflect different aspects of population biology and history
-mtDNA has a lower effective population size (around 1/4 that of nDNA) than do nuclear markers, so variants become diagnostic of taxa more rapidly
-comparison of nuclear and mitochondrial genotypes can help recognise hybrid individuals, asymmetrical mating preferences, etc.
important considerations in choosing genetic markers- rapid development
some genetic marker systems are directly applicable or easily convertible for use on new species. other are far less transferable
important considerations in choosing genetic markers- rapid screening
recent advances in technology have made screening of large population sample very rapid
important considerations in choosing genetic markers- DNA or protein
protein electrophoresis examines genetically variable proteins, and yielded the first data about gentic variation in natural populations. they are generally cheap and convenient
explain DNA advantages
-DNA is generally more variable than proteins and is thus more highly resolving
-DNA markers make available all the information carried in DNA substitutions that are not detectable by protein electrophoresis
-provides a range of sensitivities, allowing for examination of questions at the level of the individual, population, species and higher order
explain non-invasive sampling
material may be obtained without harming individuals, and even without capturing them (e.g. hairs, scat/feces, saliva, blood, tissue samples)
why is DNA good for samples
it is robust and PCR-assayable so that small and degraded samples can be used
what do nucleur and plasmid genomes reflect
different aspects of population history e.g. mito-nuclear discordance
what is the frequency of an allele equal to:
p= the number of copies of the allele in the population/ total number of copies of all alleles in the population
in a population what do frequency of alleles sum to
must sum to 1
p+q=1
what is the hardy-weinberg principle
p2 + 2pq + q2 = 1
what can genetic variation provide
can provide insight into the “health” of populations
- low genetic diversity can result lower fitness, increased susceptibility to disease and reduced capacity to adapt to change
can provide information on the ecology of the population
-high genetic diversity can be a sign of a large population or lots of movement among population
can provide information on the history of the population
-low diveristy populations might be recently founded
how is genetic variation measured and issues with it- the proportion of polymorphic loci
- the proportion of polymorphic loci (P)
P= the number of loci that are polymorphic/total number of loci studied
issues:
-not very sensitive
-no distinction between loci with e.g., 2 alleles or 20 alleles
how is genetic variation measured and issues with it-
the average heterozygosity (H)
look at lecture slides
–proportion of heterozygotes, averaged over all loci
-if their are n loci
Hobs=1/n (sum of n, i=1) Hi
where Hi is the observed frequency of heterozygotes at the ith locus, and the Hobs us the average of Hi over all loci studied
n is the number of loci
why does average heterozygosity mean different things (look at lecture slides)
when it is calculated from observed or expected values
average Hobs= mean propotion of individuals that would be heterozygous at a locus
average Hexp= the probability that two randomly chosen copies of a gene would be different alleles
Hexp is more commonly used to describe the level of genetic variation in a population, often called ‘gene diversity’
what does mutation in a germ cell give rise to
a new allele
the allele may be passed to offspring
what two forces are at play in determining the fate of the new mutation
selection
random genetic drift
what is selection
fitness of a genotype determines whether alleles are passed on to next generation, i.e. whether there is selection against or for that genotype
types of mutation and impact
-deleterious (lower fitness of genotype)
-neutral (no effect on fitnes individual)
-advantageous (increase fitness genotype, better adaptation)
The relative impact on fitness will dictate the change in expected allele frequency from one generation to the next
simple model of selection
look at ppt
example of mode of selection
a heterozygote is twice as fit as either type of homozygote.
One generation of selection can dramatically alter genotype frequencies
what is genetic drift
changes in allele frquencies due to random sampling of gametes across generations
chance events: random mortality
has larger impacts in small populations
what is the concept of effective population size
the degree to which a population experiences genetic drift can be described using the concept
what is census population size
a count of the number of individuals
what happens in a theoretical ideal population
(no migration, no selection, equal fitness, etc)
census population size (N) will equal effective population size (Ne)
what happens in real populations and why
effective population size is much smaller than census population size due to:
-unequal contribution to next generation, e.g. dominant males, litter size differences
-unequal sex ratios
-bottleneck- changes in population size over time
what effective population size is safe for a population
50/500 rule:
-to avoid inbreeding depression (i.e., loss of ‘fitness’ due to genetic problems), Ne of at least 50 individuals in a population is required
-to avoid eroding evolutionary potential (the ability of a population to evolve to cope with environmental changes), effective population size of at least 500 is required
-typical ratio suggests 10:1 is very common
in practive why is effective population size hard to calculate for a population
-dont know the numbers of breeders or offspring
-inbreeding is hard to observe
-can use gentic markers to estimate but is very complicated
when does gene substitution occur
when the mutant completely replaces the ‘old’ or ‘wild-type’ allele
-fixation probability: how likely
-fixation time: how long does it take
-rate of gene substitution: number of fixations of new alleles per unit time
explain fixation probability
depends on:
-present frequency
-selective disadvantage
-Ne
if neutral: P = its frequency
new allele: initial frequency of 1/ (2N): P = 1/2N
- if selection is positive and the population size is large: P=2s
- where s is the selective advantage
if the selective advantage is weak then P=2%
explain fixation time
depends on:
-present frequency
-selective (dis) advantage
-Ne
codominant with strong selection
neutral or weak selection
look at ppt
explain gene substitution rate
K mutations reaching fixation per unit time
neutral:
-rate of substitutions= mutation propability in population with size N x probability of fixation i.e. mutation rate
advantageous:
-rate of substitution = mutation probability in population with size N x porbabilty of fixation i.e. population size, selective advantage and mutation rate