Evolutionary Genetics Flashcards
What is the goal of phylogenetics?
- To infer evolutionary relationships between species
- Gives insight into timing and sequence of evolutionary events
What are some practical applications of phylogeny?
- Flu vaccines:
- Phylogeny is used to predict the strains of flu that will cause the most disease in the upcoming flu season (the dominant cluster) and allowing this information to inform how the vaccine is produced - Criminal cases of HIV transmission:
- HIV is an extremely diverse virus genetically; there is even a lot of diversity within an individual patient as the virus can evolve and change
- Using this diversity it can be proved if one individual gave another HIV (if genetic variance matches) - Determining the origin of diseases:
- Many emerging pathogens were originally only pathogens in certain non-human organisms - Predicting the function of a gene:
- When a gene is sequenced and its function is unknown phylogeny can be used to predict its function
- If the unknown gene protein product is similar to the protein in other organisms (e.g. model organisms) we can hypothesise the unknown gene will have a similar function
What is a weighted vs unweighted tree?
Weighted Trees:
- There is a different amount of evolutionary change across each of the lineages
- The branch lengths indicate the amount of evolutionary chance along that lineage
Unweighted trees:
- Shows that lineages share a common ancestor but makes no assumptions regarding the rates of genetic evolution in the branches
What information do rooted trees give?
- The species at the root is the most common recent ancestor (MCRA) for every OTU on the tree
- You can easily map evolutionary time using fossil records onto these trees
What data is used to build phylogenetic trees?
- Molecular sequence data:
- The most common way is DNA sequence alignment
- Amino acids of proteins can also be aligned (better if species are more distantly related) - Gene content:
- What genes an organism has/does not have (presence absence data) - Genetic distance:
- DNA-DNA hybridisation
- Immunology - Rare Traits:
- Gene order
- Introns
- SINEs and LINEs - Morphological Data:
- Measure number of vertebrae etc.
- Used less and less (unreliable)
What is the pairwise model of DNA change?
- The simplest means of determining genetic distance = the proportion of differences between any two taxa
- Calculated by: Number of distances seen/number of nucleotide sites examined
- Integrated pairwise difference data can make a matrix
What is the parameter model of DNA inheritance?
- In this model there are two parameters:
1. the rate of transitions (a) - changes between ‘like’ bases- between purines and purines
2. The rate of transversions (B) - changes between ‘unlike’ bases- between purines and pyrimidines - Each axis gets its own probability
How do we test the reliability of a phlyogenetic tree through bootstrapping?
- If we have constructed a tree we can conduct a bootstrapping procedure to determine the believability and reliability of its nodes
- This is done by making many (500+) new alignments using the original alignment (randomly sampling one site, then randomly sampling again- until you have the same number of sites as the original piece)
- These new stretches of hypothetical DNA are constructed into pseudo-trees
- You look for similarities between the pseduo-trees and the original tree
e. g. If the original tree has a and b as most closely related, what proportion of the pseudo-trees have a and b as most closely related? - If 75% of trees from pseudo-samples show the same grouping of a and b, and c and d, both nodes would have the number 0.75 placed on them
- This means the node is 75% believable
What are the features of morphological trait variation?
- Variation is continuous and discordant
- It is possible to cluster people on the basis of any one trait- but the resulting classification does not allow one to predict clustering for other traits
What proportion of genetic variance is due to (continent) population dependent differences
- Only 10-15%
(85% of variance between individuals is random and occurs within populations) - This still corresponds to <300,000 variable sites
What genetic variations do we use to determine populations?
- SNPs:
- Short nucleotide polymorphisms
- Present in the human genome- not necessarily related to phenotype
- Used as signals of relatedness and evolutionary history
- Okay measure (small amount continent specific) - Haplotypes:
- Long sections of SNPs that group together (SNPs that are linked)
- Another measure that can be used to determine relatedness
- Best measure 29% are continent-specific - CNV:
- Copy number variation
- Genes that we have multiple copies of in the genome
- Not very useful (mutating too rapidly)
Is there more genetic variation in African or non-African populations?
- When grouping populations there is the greatest amount of variation in African populations as these are the oldest, ancestral populations
- There is a genetic bottle neck in populations that left Africa
- This is because only a small subset of the African population migrated out of Africa so there is less genetic diversity
What is the international HapMap project?
- Aimed to identify similarities and differences between humans
- Used SNPs
- Only sampled 270 from 4 populations (HGDP has better sampling)
What is the Human Genome Diversity Project?
- Similar to HapMap
- Used more SNP loci
- 1043 individuals from 51 populations (much better sampling)
How do association studies work?
- The idea is that: if there is a mutation that occurs and it is inherited into groups then these individuals may share a phenotype that relates to the genotype
- The SNP does not cause the disease- rather it is located close (linked) to the gene that does e.g. a gene that when mutated causes heart disease
- The SNP can be used as a signal, if individuals inherit the SNP then they most likely also inherit the particular gene mutation being studied that may cause elevated levels of cancer or heart disease etc.
- If a SNP is found to correlate with a disease, the genes surrounding the SNP can be studied to determine if they are causative