Week 7 Flashcards
What is genomics?
The study of the entire hereditary information in an organism, which is mostly encoded in the genome
What are genomes?
The sequence of all the DNA in a cell
What are subsets and products of the genome?
Mitogenome the mitochondrial genome
Exome all the exons that could potentially be expressed
Transcriptome the expressed genes (expressed exons) in a particular tissue or set of tissues that you are studying
proteome the proteins
Metabolome other metabolic products…
Microbiome the combined microorganisms inhabiting a Particular environment (e.g. your gut), which can be detected using sequencing strategies
What is needed to study genome?
You need its DNA sequence
What species were prioritised for sequencing?
1 - fuzzy or good to eat eg tomatos, rice, cows and dogs
2 - If it belongs to an evolutionarily, scientifically, or economically important species eg ants, bees nematode, mosquito, Arabidopsis and human
What is the number genome sequencing of bacteria?
Thousands of species of bacteria (a few hundred dollars per genome, these days)
What are examples of genome sequencing projects?
Vertebrate genomes project - generate error free genomes of all 66,000 extant vertebrate species
Darwin Tree of life project - gene sequence of all life living in the UK
Earth Biogenome project - gene sequence for all life on earth
What is an overview of the shotgun sequencing method?
Collect the organism and extract a lot of high-quality DNA (long strands)
break into fragments (enzymes, sound)
read the fragments with a high-throughput sequencer (currently Illumina, PacBio and Nanopore machines are dominant)
Piece together the fragments
Recognise the components (annotation)
How large of the DNA fragments created in shotgun sequencing?
De novo genome assembly is piecing together an encyclopedia from 300-500-letter fragments of sentences
What are the cons of shotgun sequencing?
It takes a LOT of work and money to make a ‘finished’ genome from raw fragments. Most published genomes are just tens of thousands of fragments (contigs and scaffolds), but are long enough to read lots of genes
What is the order for shotgun sequencing?
Reads –> Contigs –> Scaffolds –> Chromosomes
What is the pros and cons of long-read sequencing?
Long-read sequencing = less rebuilding needed afterwards!
Better for reading repetitive regions of the genome currently error prone
What is the length of PACBIO and Nanopore reads?
PACBIO - 20-40 kb read lengths
Nanopore - Up to 100 kb read lengths
What are the uses of sequencing genomes?
Genomes themselves are interesting to study
(Try) to find all the genes involved in a phenotype, not just one or two
To reconstruct deep phylogenies (phylogenomics)
Cancer genomics
Inform conservation strategy
What are examples of organisms with variable gene size?
Influenza - 11
E.coli - 4,149
Fruit fly - 14,889
Chicken - 16,736
Human - 22,333
How can genome size vary?
Basic features similar, genome size is highly variable
10,000 fold range between fungi and flowering plants
Number of genes varies much less
What are examples of genome size and number of genes in plants?
Arabidopsis thaliana - ~25,000 genes, 135 Mb genome size
Canopy plant (Paris japonica) - ?? genes, 152,000 Mb
What is a case study about investigating gene diversity?
Three difference identical looking species of fish
1 is a diploid (Corydoras maculifer), 1 is a tetraploid (Corydoras aragu) and 1 is an unknown
Investigated the variation in the immune genes
Does the increased genome size and copy number have difference in immune genes?
What was investigation into genetic diversity of different fish species looking at?
TLR1 and TLR2
PCR amplified 2 toll-like receptor genes
2.5 Kb each
What was the overview of the sampling of the difference fish species?
Polyploid - n = 30
Diploid - n = 23
Sequenced on NextSeq platform
What was the genetic diveristy metric used?
Single Nucleotide Polymorphisms leading to changes in amino acid sequence
What was the genetic diveristy metric used?
Single Nucleotide Polymorphisms leading to changes in amino acid sequence
What is a synonymous SNP?
A SNP that has a change in nucleotide sequence but not in the amino acid sequence
What is a non-synonymous SNP?
A SNP that has a change in nucleotide sequence causing a change in the amino acid sequence
What is important about haplotypes for genetic diversity?
Number of haplotypes an organisms has can suggest what ploidy it is
How can you measure genetic diveristy using SNP and haplotype?
Sequence all the DNA compard to a target DNA sequence with sequencing depth showing how many times a sequence has been repeatedly sequence
Then find a SNPs and count then up eg A in base and frequently in other strands
What is the difference between haplotype number of a diplod and tetraploid
SNP ratio of 50:50 shows that it is diploid
SNP ratio of 25:75 shows that it is tetraploid
What was the difference in SNPs between the diploid and tetraploid fish?
Diploid fish has very few SNP across all groups eg Non-synonymous/ synnoymous and both TLRs
Tetraploid fish had more SNPs across the board
What can show the ratio for the haplotypes of SNP in tetraploid/ haploploid?
Histogram
SNP read ratio on x axis around 0.5 is a diploid
SNP read ratio on x ratio with two major peaks around 0.25 and 0.75 shows tetraploid
What is an examole of comparative genomics?
Comparing the genes involved in echolocation between micorbats and toothed whales
What was the candidate gene for echolocation?
Prestin gene (important for ‘electromobility’ of the cochlear ear) from echolocating & non echolocating bats, echolocating dolphin, and many (non-echolocating) mammal
What is the overview of the phylogenetics of the prestin gene?
High suggestive of convergent molecular evolution of a key geen for echolocation
What are the problems of using the prestin gene?
Its just one gene and the trait of echolocation can’t be due to one or a few genes
There mist have been adaptive changes throughout the genome
Further studies found other examples of convergent genes
What is the overview of genome wide comparison of echolocating animals?
Compared whole genomes of 6 species of bat (4 echolocating and 2 non-echolocating), 1 species of (echolocating) dolphin, and 15 other (non-echolocating) mammals.
~20-30,000 genes per species, but their genomes are very fragmented, so they ended up with 2,326 coding sequences (~genes) that were in all 22 species
Made a huge phylogeny from all the genes (known as phylogenomics)
What where the results of the genome wide comparison of the origin of echolocation?
Echolocating independatly evolved across species
Alternative (false) phylogeny that groups all the echolocating species together (think of the Prestin gene example)
How supported was the alternative phylogeny based on genes like prestin?
400-800 genes that supported the echolocators-together
These 400-800 genes are candidates for genes that were selected to allow echolocation, and they included the 7 candidate genes that have been published before (e.g. prestin).
But they also found many other hearing-related and vision-related genes
What is the function of resequencing?
Compare individuals within a single species
What is the advantage of having a reference genome?
Once you have a high-quality, reference genome, the genomes of subsequent individuals from the same species are much easier to assemble. If you already know the book, you can assemble similar versions of the book from fragments, even if you have fewer fragments
How expensive is resequencing?
Resequencing of humans is very common these days and costs less than $1000 per genome
What is an example of whole genome resequencing?
Whole genome sequencing of Oreochromis cichlid fish
Why did they whole genome sequence a large number of genome of cichlid fish?
Use the resequencing data to identify ideal candidates to be bred from for aquariums and hobbists to reduce the pressure on natural populations of cichlid fish as they know the genome
What is an example of resequencing on evolutionary history?
Relationship between gray wolf species and compared to domestic dogs
Compared ancient wolf DNA which showed that modern dogs evolved from East eurasian wolf population
What is phylogeography?
The field of study concerned with the principles and processes governing the geographical distribution of genealogical lineages, especially those at the intraspecific level
What is biogeography?
Biogeography is the study of the patterns and causes of the distribution of living things
How can geography impact biology?
Almost all taxa are restricted geographically to some degree.
Some are very restricted - endemic to a small area
What are the major facotes in biogeography?
Major historical factors that influence current distributions include vicariance and dispersal
What is vicariance?
Splitting of distributions when ancient landmasses split and separate due to continental drift, or when mountain ranges divide lowland populations
What is dispersal?
The movement of organisms or their propagules (e.g seeds)
How can vicariance and dispersal be investigated?
The relative contribution of vicariance and dispersal to current distributions can be investigated using phylogeographic methods
What is wallace’s line?
Wallace’s line divides two regions that have separate tectonic histories that have only recently come into contact
What is a large factor in biogeographic distrubutions?
Continental drift offers an explanation for many biogeographic observations
What is an example of Wallaces line?
Indopacific
North west contains large placental mammals eg Javan Rhino and Sumarian tiger
South east contains marsupials and megapodes (terrstrial birds)
Located on seperate tectonic plate
What is overview of information used in phylogeography?
Phylogeography generally uses genetic information to examine genealogical history and patterning within species and populations.
This information is used to infer relationships of
biogeographic areas and species histories
What is an overview of the function of phylogeography?
Phylogeography concerns the relationships between
gene geneologies, phylogenetics and geography
Aim of understanding the factors contributing to the formation of genetic population structure
What is an example of a use of phylogeography?
Phylogeography can explain the consequences of major historical events that had continent-wide impacts
Effect of glaciation particularly well studied in Europe
Last ice age had a glacial maximum ~18,000 – 22,000 years ago
Fragmentation of populations in refugia (in Europe, the Iberian, Italian and Balkan peninsulas)
What is the migration pattern of hedgehogs after the last ice age?
Hedgehogs that refuged in spain migrated through france to UK
Hedgehogs that refuged in Italy migrated through central europe eg germany and ended up in scandinavia
Hedgehogs that refuged in greece migrated migrated through eastern europe and ended up in Russia
What is the migration pattern of Grasshopper, (Chortippus parallelus) after the last ice age?
Grasshoppers that refuged in spain stayed in spain
Grasshoppers that refuged in Italy stayed in Italy
Grasshoppers that refuged in Greece migrated across all of europe
Cantabrian Mountains and Alps stop the migration of grasshoppers
What is the migration pattern of Bear, (Ursus arctos) after the last ice age?
Bears that refuged in spain migrated through France through to UK and Scandinavia
Bears that refuged in Central Asia migrate through Russia through to East Europe and Northern Scandinavia
Grasshoppers that refuged in Greece migrated across Balkans
What is the migration pattern of Chub, Leuciscus cephalus after the last ice age?
Chub that refuged in spain stayed in spain
Chub that refuged in Italy stayed in Italy
Chub refuged in Danube spread across europe travelling to UK and central europe
Chub refuged in Ukraine spread through Russia and East Europe to baltics
Cantabrian Mountains and Alps stop the migration of Chub
What are the markers used in phylogenetic studies?
Mitochondrial (mtDNA) or chloroplast (cpDNA) vs genomic markers
What are considerations for phylogenetic markers?
Polymorphism, recombination, mode of inheritance
What are methods used in phylogeography?
Coalescent theory
Nested Clade Analysis (not much any more)
What are the advantages of using mtDNA as a marker?
Effectively neutral markers
High mutation rate means that variation will usually be present
High copy number allows for ease of amplification from limited or archived samples
Effective population size ¼ of diploid nuclear genes so genetic drift occurs faster
No recombination so each uniparentally inherited haplotype has only one ancestor in previous generation
What are the disadvantages of using mtDNA as a marker?
Uniparentally inherited so if there are differences between sexes then no information about one sex
In plants mtDNA is less variable and recombines cpDNA variation higher, but not as high as animal mtDNA
What is an overview of using nuclear DNA as a marker?
Nuclear DNA markers are recombining and can be under selection
More complete but far more complicated picture
More difficult to obtain from archived specimens (fewer copies).
Starting to increase with the development of Radseq and genome resequencing
What is an overview of Nested Clade phylogenetic analysis?
Invented by Templeton
Aims to identify past demographic events that have shaped the history of a population or populations
Geographically contextualized gene genealogies
Infer demographic history of each taxon
Prone to false inferences….
Not used any more
What is the pattern of use of Nested Clade analysis overtime?
Started in late 90s and then spiked in early 2000s by 2008 hit peak and then sharply declined and in 2020 no publications used it
What is an overview of Coalescent theory?
The tracing of allelic ancestries back to their most recent common ancestor
mtDNA lineages will coalesce on average 4 fold faster than recombining nuclear markers
With nuclear DNA sequences recombination is possible within and between alleles
The number of potential ancestors of an individual doubles with each generation back
What is statistical phylogeography?
Based on coalescent models for parameter estimation and hypothesis testing
What is a hypothetical example of the use of statistical phylogeography?
One might want to test two models:
Model A that posits that extant populations in the focal taxon arose from a single population that persisted since before the last glacial maximum (LGM),
Model B that posits that extant populations descended from two isolated populations that both persisted since before the LGM.
A summary statistic is calculated from simulated data sets under each model to obtain a distribution of the summary statistic under each respective model.
In this scheme, the probabilities of both models are evaluated with respect to the summary statistic calculated from the empirical data (Knowles, 2001).
What is a rough method for statistical phylogeohraphy?
1: Summary statistic calculated from simulated datasets under each model to obtain a distribution of the summary statistic under different models.
Probabilities of each model are then evaluated with respect to
the summary statistic estimated from empirical data
2: Full Likelihood/Bayesian approach
3: Approximate Bayesian Computation
What is lineage sorting?
Equilibrium between mutation and drift
When is lineage sorting fastest?
Lineage sorting effect greatest with small population sizes (more likely to lose alleles by chance)
What impacts the number of haplotypes in a population?
Number of haplotypes in a population is a function of current and historical effective population sizes
What is network of haplotypes?
Most common and widespread allele in centre
With single mutation steps decending off of them
Size of circle represents the number of organisms with the mutation
Perpendicular lines on ajoining line can be used to show multiple mutations have occured
What can impact haplotype spread?
Few species exist as single, undifferentiated populations
Vicariance and dispersal
What is vicariance?
Process of separation due to environmental events (eg. formation of mountain ranges, sea-level changes)
What is important about vicariance and dispersion?
Together with dispersal ability important determinant of species natural geographic range
What factors affect divergence between populations?
Gene flow between species (inter-specific hybridization
Genetic drift and gene flow
What can impact genetic drift and gene flow?
Isolation by distance
Physical barriers
What is the use of haplotype trees in phylogeography?
Correlations between haplotype trees and geographical information can allow inference to be made about genetic processes occurring between populations
What is a haplotype network?
A haplotype network in which each number represents a different haplotype, and the size of a circle is approximately proportional to the number of individuals sequenced containing that haplotype
What is a general key?
Solid circles represent haplotypes that were either not sampled, or are extinct.
Lines connecting haplotypes indicate single mutational differences.
Circles are coloured for reference to their geographic distribution
What does a branch mean in a haplotype tree?
Each branch = 1 mutational step
What happens when you mix geography and gene tree?
Congruence of geography and gene tree – the most ancient haplotypes located at centre of tree and are geographically the most widespread, most recent haplotypes at tips of tree and localised geographically
What happens if you break down congruence between persistence and differential sorting?
Congruence broken down if there is persistence and differential sorting of haplotypes
Gene flow can be used to further break it down
What predictions can be made about haplotype networks?
High frequency haplotypes likely to be older
With in a network, older haplotypes more likely to be interior. whereas newer haplotypes likely to be peripheral
Haplotypes with multiple connections are likely to be older
Older haplotypes are expected to have a broader geographical distrubution - more time to disperse
Haplotypes with only one connection are likely to be connected to haplotypes from the same population -> evolved recently less time to disperse
What did the haplotypes in African Buffalo show about their behaviour?
Chobe location geographically distant from other sites
Genotypes in Chobe from different places in haplotype map
FST = 0.08
High levels of migration
What did the haplotypes in Impala show about their behaviour?
Chobe location geographically distant from other sites
FST = 0.1
Fragmentation or isolation by distance by distance
Genotypes in chobe are clustered on a haplotype map
What is an overview of biogeography of the southeastern USA amogst freshwater fish?
Gulf coast and Atlantic coast freshwater drainages
Large faunal differences between two drainages with biggest split between western and eastern drainages
What is the difference in mtDNA amogst taxa characteristics of freshwater fish in S.E USA?
A pronounced pattern of intraspecific mtDNA concordance among taxa characterises freshwater fish in the southeastern U.S
Seen with distrinct mtDNA mutations depending on which basin, species seen in Bowfin (Amia calva), Spotted Sunfish (Lepomis punctatus), Redear Sunfish (Lepomis microlophus)
What can be said about the impact the distrubution of the genetic diversity of S.E USA freshwater fish?
Theoretically, the same factors that are inferred to have influenced the distribution of genetic diversity in southeastern American freshwater fishes, should have had similar effects in the marine and coastal environment.
What was the hypothesis proposed for the distinct subpopulations of S.E freshwater fish?
During glacial advances, an enlarged Floridian peninsula may have contributed to the separation of some Atlantic and Gulf coast populations through creation of a rather isolated pocket of estuarine habitat in the Western Gulf of Mexico.
What was used to test whether and enlarged floridian penisula may have contributed to seperation of freshwater fish populations?
10 unrelated species or species complexes analysed using mitochondrial DNA.
What did they discover about the subpopualtions of the gulf and atalantic fish populations?
Overall, among the 10 coastal and marine species or species complexes surveyed, at least 5 and as many as 8 evidence a fundamental mtDNA subdivision involving Atlantic versus Gulf coast populations
What does the outcome of the mtDNA sequencing suggest about the splitting of the populations?
This concordance within and among the faunas of the marine and freshwater environments provides a compelling case for a strong influence of historical biogeographic factors within the southeastern United States
What species were the exceptons for genetic difference between atalantic and gulf populations?
Two species, the hardhead catfish and the American eel showed no phylogenetic subdivision between the Gulf and Atlantic
What species have shown differences between the two populations?
Horseshoe crab, American Oyster, Toadfish and Black sea bass
What is an example of phylogeography in Hawaii?
Clermontia - phylogentic postition can be used to show potential dispersion events and then can be combined with a molecular clock to show which is most likely
What is an overview of the distrubution of tigers?
Tigers historically ranged across Eurasia from the Sunda Islands, west through the Indian subcontinent to the Indus river and north along the Pacific seaboard and a wide swath of central Asia from the Russian Far East to eastern Turkey.
What is an overview of the distrubution of the caspian tiger?
Caspian tiger thought to be a subspecies separate from others based on morphological grounds.
Became extinct in February of 1970 when the last survivor was shot in Hakkari province, Turkey
What are the three distinct routes that the caspian tiger could of gotten to central asian range?
A) a southern route, via the Indian subcontinent south of
the Himalayan plateau
B) a northern route, settling first the Amur region and then
traversing Siberia westward, north of the Mongolian
steppe
C) via the historical “Silk Road” through the Gansu corridor,
between the Himalayan Plateau and the Mongolian
Gobi desert
What tiger species would Caspian tiger have an affinity if followed route A?
A close molecular affinity would exist between Caspian tigers, P. t. virgata, and Bengal tigers, P. t. tigris
What tiger species would Caspian tiger have an affinity if followed route B?
A northerly migration would predict genetic admixture and similarity of P. t. virgata with South China tigers P. t. amoyensis, as a result of range overlap during the postulated migration.
What tiger species would Caspian tiger have an affinity if followed route C?
The Amur tiger (P. t. altaica) and the Caspian tiger (P. t virgata) are sister taxa to the Indochinese tiger (P. t. corbetti) from which they are separated by six mitochondrial steps (five for P. t. virgata).