Population and Comparative Genomics Flashcards
what is population genomics?
gives a comprehensive picture of genetic variation within species by looking at whole genomes
what features can we characterize using population genetics?
- demogrpahy
- natural selection (purifying, adpative, balancing)
what is the first stage of gathering population genetics data and what does it entail?
- hypothesis/query
- need to know what you want to find out
what is the second stage of gathering population genetics data and what does it entail?
- sample collection and DNA extraction
- choose 100s/1000s of individuals information
- choose geographic/habitat of interest
- extract genomic DNA
what is the third stage of gathering population genetics data and what does it entail?
- genome sequencing
- sequence the DNA, reads are from sections of the genome
- want lots of reads
- obtain sequence coveraring 5-40x coverage
- sequene genome using ‘short’ read technology
- main issue here is cost
what is the fourth stage of gathering population genetics data and what does it entail?
- read mapping and ‘variant calling’
- locate genetic variants (sites of the genome that differ)
- find where each read matches to the genome
- looking for polymorphisms
- use SNPs and indels
- can map sequence reads to a reference genome and identify sites that differ
what is the fifth stage of gathering population genetics data and what does it entail?
- segregating genetic vairants
- as a result of read mapping you want a list of positions that vary
- alleles/polymorphisms/variants
what is the sixth stage of gathering population genetics data and what does it entail?
- analysis
- analyse certain sites and use their traits to determine which alleles have an effect on a particular trait
- describing demogrpah
- detecting selection
- quantitative genetics like GWAS
what is sanger seqeuncing?
- small scale (not high throughput)
- technology of hcoice for low-medium output sequencing
- can use it for one gene
what is illumina?
- produces vast numbers of reads
- much quicker, short lengths of sequences
- technology of choice for genome re-sequencing
what is PACBIO?
- pacific biosciences
- produces larger reads
- fairly accurate
- one technology of choice for genome assemblies
what is the oxford nanopore?
- produces very long reads (up to 40,000 nucleotides long)
- advancing fast but more expensive
- has the worst error rate
what is meant by demography?
- estimates of population size (can also estimate population size backwards through time)
- population structure (which individuals are more or less closely related)
- migration and ‘gene flow’ between populations
- inbreeding/outbreeding rates
what is selection in population genetics?
which regions of the genome are subject to strong purifying selection (remove bad mutation)
what is an example of quantitative genetics?
GWAS: which alleles contribute to traits
how are demography, selection and quantitative genetics interrelated?
- expanding and shrinking population sizes effect selection
what is the concept of genetic diversity in population genetics?
within a region of a genome there are different amounts of diversity
what are polymorphisms/alleles/variants?
- sites in the genome that differ between individuals of a species
what are SNPs?
- single nucleotide polymorphisms
- these are the most common
what are indels?
- small insertions or deletions
what is the human genome comosed mostly of?
transposons
what are examples of structural variants?
duplications, rearrangements, large inserrtions/deletions
what is the initial origin of variation?
a mutation in one individual
- all polymorphisms start with a single mutation in the popultaion
how can polymorphisms move?
through space and time within a population
- their frequency will change
how will a polymorphism occur in a population?
- get two separated population
- one gene gets across
- a mutation is shared
- over time it would increase
what are most mutations?
- neutral
- deleterious adnd therefor elost
what happens if variants are physically linked on the chromosome?
they tend to travel together but can become unlinked through recombinations
what is the concept GWAS?
- GWAS
- lots of data is in a matrix (0s and 1s)
- want to use summary statistics - summarising information in one number
- average pairwise similarity
what does S stand for?
the number of segreagating sites
what is MAF?
- minor allele frequency
what is DAF?
derived allele frequency (frequency of new allele in populate)
- need to know the ancestral genome
- DAFs are rare as they tend to get lost - suggests adaptation
what is the concept of Tajimas D?
describes whether you have more or less rare alleles than expected
what happens if you have a negative tajimas D?
- have more rare alleles then you expect
- happens when theres a selective sweep (new mutations throughout the population)
- or expanding population
what happens if you have a positive tajimas D?
- too few rare alleles
- signal of balancing selection
- shrinking population
- population structure
what happens if tajimas D =0?
neutrally evolving, stable population
what is the concept of population structure?
- when you have individuals more likely to breed with each other than another set
- can see this through genomes
how can you look at population structure?
- genomes
- act as markers to track evolution
- when people etc move they carry DNA
- populations somtimes have small contributions which cant be drawn on a phylogenetic tree
what are the rules of population strutcure?
mutations are rare, drift through populations, recombinations
what is the concept of purifying selection?
- loss of deleterious alleles
- they are removed from the population
- they are less fit so they die/produce less offspringg
what does the process of purifying selection result in?
- reduces diverstiy in regions that are important
- increase proportion of rare alleles
- causes a negative tajimas D
- purifying selection expected to be a common event
what is the properties of most new mutations?
deleterious
why do exons have much lower diversity?
- mutations are more likely to be deleterious
- exons have an important function so deleterious mutations are removed quickly
what is the concept of adaptive evolution?
- new mutation is helpful and increases to become more common in the population
- has similar effects to purifying selection (difficult to differentiate)
what does the process of adaptive evolution do?
- reduces diversity around the beneficial allele
- increases rare alleles
- causes a negative Tajimas D
- adaptive selection is expected to be a rare event
why is adaptive evolution rare?
a mutation causing a beneficial adaptation through a random change will be rare
what is a selective sweep?
overtime not only will the beneficial allele become more common but so will the linked alleles
what is a haplotype?
- region of the genome with alleles that are linked
what is the concept of balancing selection?
- advantage to maintaining more than one allele in a population
- very rare
- when the heterozygous are fitter
- advantage of rare alleles but when they become common they are less advantageous
what are the results of balancing selection?
- maintains more diversity
- cause a high tajimas D
what is the concept of polygenic selection?
- GWAS shows that most traits are determined by multiple genes
- called complex traits
- selection acts on all the alleles at once
- there is therefore selection for multiple genes
- when these traits evovle many alleels traits
what is the concept of linkage of alleles on the chromosome?
- when a strongly beneficial allele arise it will ‘sweep’ through the population
- arises very quickly
- alleles close to it will be carried because they are linked
what are the results of linkage of alleles on the chromosome?
- loss of diversity around the sweep
- increase in linkage
- produces a large loss of genetic diversity (always the same)
what happens when recombination occurs?
linked alleles can become unlinked
what is comparative genomics?
- the comparison of genomes between species
what does comparative genetics involve the analysis of?
- gene orthologs/paralogs, gene family expansions
- gene loss/gain
- evolutionary rate of genes
- conserved genic and non-genic regions
- conservation/changes in synteny (gene order)
what are orthologs?
gene which is from a recent ancestor between species
what are paralogs?
gene which is from a recent ancestor within species
what is the first stage of collecting comparative genomics data?
- sequence and assembly a genome
- choose the organisms interested in
- assembly: connecting ll short/long sequencing reads in continuous seqeunces
- sequnce machines are generally shprt reads
what is the second stage of collecting comparative genomics data?
- annotate your genome (identify gene starts, ends, exons and identify gene types homology)
what is the third stage of collecting comparative genomics data?
- align/ compare your genome to others
- whole genome alignment
- using BLAST to locate similar genes
what is comparative genomic data produced on?
linux server - large amount of data with a lot of processing required
what can be found from comparative genmoics?
- Which genes have been lost in a lineage
- When genes have been gained created through things like gene fusion
- Which are the fastest evolving genes
- Conserved genic and non-genic regions
- How a species may have evolved to adapt to some new niche how a particular species has evolved and adapt says something about long term evolution
- The higher the peak the slower the rate = more conserved purifying selection removes deleterious alleles
what is the concept of diversity of divergence are related in comparative genomics?
genetic diveristy within species gives rise to divergence between species
what is genetic diveristy?
differences within species
what is divergence?
differences between species
what are exons?
evolve slowly, mutations most often remove
what is an example of genetic diveristy giving rise to divergence?
- one population splits into two population
- at some point there is no interbreeding
- different alleles become fixed independently through mutations arising
what is fixation?
when a polymorphism becomes present in all individuals in a species (or population)
what is the concept of evolutionary rate in comparative genomics?
- evolutionary rate is the number of differences that occur over time or how many mutations are fixed in a population over time
- measure via alignments from genes and genomes
- every genome evolves at a different rate
how can evolutionary rates be measured?
- substitutions/year: certain numbers of substitutions per year (have to know the years they’ve been separated)
- substitutions/gene or per site between two or more species
what is the concept of purifying selection in comparative genomics?
- selection to remove deleterious mutations
- over time this results in slower rates of evolution in regions of the genome with more essential function
what are introns?
- not conserved and are therefore not removed by purifying selection
what happens if regions are more highly conserved?
- suggests that the regions are more important
how can purifying selection be detected in comparative genomics?
- via genome alignment
- looking for regions that remain the same between species
- can show evolutionary rate: slower rates of evolution result in more important regions being conserved
what is synonymous change?
does not change the amino acid encoded for, would therefore not have a strong genetic outcome
what is non-synonymous change?
- does change the amino acid encoded for
- more likely to have functional consequence (which will generally be deleterious)
is the rate of synonymous change slower than non-synonymous change?
no
what is the concept of adaptive evolution in comparative genetics?
- increase frequency of adaptive allele
- some genes/genomic regions evolve to have new/improved functions
- this is one path to adaptation
- such genes change faster than we expect by chance
what tests can be use to measure adaptive evolution in comparative genetics?
- the dN/dS test
2. the McDonald-Kreitman test
what is the dN/dS test?
- dN: the rate of non-synonymous change
- dS: the rate of synonymous change
- gene that change their function rapidly may have a higher dN than dS
what is the McDonald-Kreitman test?
- use for detecting adaptive change between species
- and for detecting balancing selection within species
what is the rate of synonymous change (dS)?
- synonymous change does not affect the protein produced
- will have little or no effect on the fitness of the organims and so are selectively neutral and will accumulate
- sometimes they can result in non-optimal codon (rare)
- if species are far apart this rate needs to be corrected for multiple hits
what is the rate of non-synonymous change (dN)?
- non-synonymous change does affect the protein produced
- most will be deleterious and lost
- so the dN rate will generally be slower than the dS rate
- hence the dN/dS rate is generally less than 1
what does it suggest if dN>ds?
- there has been many non-synonymous changes
- this is rare and a signature of adaptive evolution
what is the concept of polygenic selection and genome-scale data in comparative genomics?
- SNPs in many genes can affect one trait
- adaptation may cause gradual changes in many genes
- can detect this by looking for concerted signals over certain categories of genes that work together
what is the assumption of the McDOnald-Kreitman test?
tests the assumption that diversity within a species gives rise to divergence between species (assumes theres a stable ratio)
- assumes a stable ratio of synonymous and non-synonymous polymorphisms
- over time polymorphisms become fixed
- gives rise to the same ratio of synonymous and non-synoymous fixed mutations
how can you test the McDonald-Kreitman test?
- using the chi squared test
- count that sites that are synonymous and non-synobymous
- chi-squared
- find if the rate is stable
what is the result of a McDonald-Kreitman test for a neutrally evolving gene?
- ratio will be consistent
what is the result of a McDonald-Kreitman test for an excess of non-synonymous fixed differences (a non consistent ratio)?
adaptive evolution between species
what is the result of a McDonald-Kreitman test for an excess of non-synonymous polymorphisms within a species (a non-consistent ratio)?
balancing selection to maintain different non-synonymous differences within species