S2: W6 (Dr. Hanlie) Flashcards

1
Q

Thing to note on genome size?

A

Size of genome doesn’t reflect the ¹complexity & ²size of organisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whole genome DNA extraction attributes? (2)

A

• From various tissues like blood, body fluids.
• Extract whole genome of organism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why extract the whole genome/DNA?

A

It’s because you want to examine a piece of the DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to examine piece of DNA?

A

PCR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does PCR do?

A

It amplifies (clones) the desired piece/region of DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

PCR process steps? (3)

A

• Denaturation.
• Annealing.
• Elongation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Primers attributes? (3)

A

• 30-50 bp.
• Forward primer (5’—3’).
• Reverse primer (3’—5’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Purpose of primers?

A

To go anneal to your DNA region of interest just before your region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gel electrophoresis attribute?

A

Bands producing ladder is the no. of bp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Types of sequencing? (2)

A

• DNA sequencing (Sanger method).
• Next-Generation sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DNA sequencing (Sanger method) attributes? (2)

A

• For standard PCR.
• Electrogram results at the end.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Next-Generation sequencing?

A

= amplifying multiple gene regions simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the choice of sequencing method depend on? (2)

A

• Research goals.
• Budget.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why use molecular characters to study evolutionary patterns? (2)

A

• Homoplasy.
• Rare events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain Homoplasy as a reason we use molecular data to study evolutionary patterns?

A

Not misled by convergence in morphological characters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain Rare events as a reason we use molecular data to study evolutionary patterns? (2)

A

• Show duplications, insertions/deletions, rearrangements.
• Very informative as they are not seen in ecology & morphology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is homoplasy problematic in molecular data? (2)

A

• Only 4 bases (A, T, G, C).
• Mutations are common (and you have to track them).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Gene/Region alignment?

A

= statement of homology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why do we do Gene/Region alignment?

A

It’s because we are looking to group similar things together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why could deletions be informative? (2)

A

• Give information on gene region of interest.
• Give information on species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Gene vs Gene region vs Gene fragment?

A

● Gene
= theoretical region.

● Gene region
= can incorporate many genes & is relative.

● Gene fragment
= piece of a gene.

  • When in doubt, just use the word “region”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Kinds of DNA to consider using for phylogenetic study? (3)

A

• Mitochondrial.
• Chloroplast.
• Nuclear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

mtDNA attributes? (3)

A

• Maternally inherited (usually).
• Circular.
• Evolves faster than nuclear genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

cpDNA attributes? (2)

A

• Maternally inherited (usually).
• Circular.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

nDNA attributes? (3)

A

• Biparentally inherited.
• Linear chromosome.
• Evolve very slowly, thus have slow evolutionary patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why are mtDNA usually maternally inherited?

A

It’s because sperm lose their mitochondria during fertilization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does your choice of DNA depend on?

A

Your research question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Eg of where choice of DNA is important?

A

If you want to get the family history of dogs, use nuclear DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

mtDNA in plants attributes? (4)

A

• Circular.
• Maternally inherited.
• Very unstable.
• Highly variable (many rearrangements).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Why are mtDNA of plants very unstable?

A

It’s not informative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Which DNA type is informative in plants?

A

cpDNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

rRNA genes attributes? (3)

A

• Highly conserved coding region.
• Useful at family level & higher.
• Contains small subunit & large subunit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you choose which region to use?

A

Depends on what research question you want to use (eg in Forensics, Conservation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Thing to note on “How do you choose which region to use”?

A

When examining the speed of mutations, look at the gene region, particularly the mtDNA & nDNA. The deeper the relationships, the slower the mutations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When examining speed of mutations, what do we look at?

A

The gene regions, particularly the mtDNA & nDNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

mtDNA attributes in terms of speed of mutation? (2)

A

• Faster mutation rates.
• Individual & population level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

nDNA attributes in terms of speed of mutations? (2)

A

• Slow mutation rates.
• Family level to deeper relationships (very old origin).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Applications of molecular phylogenetic studies? (6)

A

• Clarify monophyly & the classification/delimitation of taxa (eg. genera, species).

• Interpretation of morphological evolutionary patterns.

• Trace the evolutionary history a species & explain current distribution (phylogeography).

• Provide a basis for interpretation of modes of speciation.

• Provide a basis for conservation prioritization.

• Enable tracing of sources of human diseases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Eg of 1st Application of molecular phylogenetic studies?

A

Olive Ridley sea turtles vs Kemp’s sea turtle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What could it mean when you have incongruence between nuclear & organelle (mt/cp) genomes?

A

Could be hybridization and/or introgression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What do you mean when you say “Incongruence between nuclear & organelle genomes”?

A

We mean that the phylogenies of nuclear DNA & mt/cp DNA don’t match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Why are more sequences not enough?

A

Chloroplast capture hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Chloroplast capture hypothesis?

A

= where you have inter-species hybridization & subsequent backcrosses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Result of Chloroplast capture hypothesis?

A

You have a new combination of nuclear & chloroplast genomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Thing to note on Incongruency?

A

Just know that there are many reasons for incongruency & know what to do when you do get incongruency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What do different Phylogenetic Inferences (PI) depend on? (2)

A

• Data that you’re working with.
• Research question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Types of PI methods? (4)

A

• Neighbour joining (NJ).
• Maximum Parsimony (MP).
• Maximum Likelihood (ML).
• Bayesian Inference (BI).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

PI methods attributes? (5)

A

• Different techniques to reconstruct evolutionary relationships.
• Employ different algorithms.
• Common inference methods include: NJ, MP, ML & BI.
• Different criteria, assumptions & interpretations.
• Each with own pros & cons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What do you by “Employ different algorithms”? (2)

A

We mean that they are either:

• based on underlying principles.
• based on computational requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

NJ attributes? (5)

A

• Distance method.
• Accounts for the rate of evolution.
• Based on substitution models (at nucleotide/amino acid sites) to estimate genetic distance from sequence data.
• Outdated.
• Branch lengths represent distance, not specific mutations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

“Neighbour” from NJ?

A

= involves closely or distantly related species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Types of distance? (2)

A

• NJ.
• UPGMA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

UPGMA?

A

= distance method that accounts for a constant rate of evolution (all branches of equal length).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Types of clustering? (2)

A

• K-means clustering.
• Hierarchical clustering.

55
Q

K-means clustering?

A

• requires the no. of pre-defined clusters (you tell it).

56
Q

Hierarchical clustering?

A

• gives you a dendrogram (it tells you the number of clusters).

57
Q

Distance method vs Clustering method?

A

● Distance method
= takes into account the evolutionary processes.

● Clustering method
= doesn’t take into account the evolutionary processes.

58
Q

NJ Pros? (5)

A

• Speed & efficiency.

• Robustness to model assumptions.

• Versatility.

• Interpretability.

• Ease of implementation.

59
Q

Explain speed & efficiency as NJ pro?

A

It’s suitable for large datasets, heuristic nature & can handle high dimensional distance matrices.

60
Q

Explain Robustness to model assumptions as NJ pro?

A

It’s less sensitive to model misspecification or violations due to there being no complex evolutionary models used.

61
Q

Explain versatilility as NJ pro?

A

You have a wide range of distance matrices.

62
Q

Explain Interpretability as NJ pro?

A

Branch length is proportional to the estimated distances between taxa.

63
Q

Explain Ease of implementation as NJ pro?

A

It’s some to interpret & understand with limited computational resources/expertise.

64
Q

NJ Cons? (5)

A

• Sensitive to long branch attraction.

• Lack of statistical support.

• Inability to incorporate evolutionary models.

• Limited accuracy.

• Dependence on distance metrics.

65
Q

Explain Sensitive to long branch attraction as NJ con?

A

It’s a potentially biased phylogenetic inference as it doesn’t consider mutations individuals.

66
Q

Explain Lack of statistical support as NJ con?

A

There’s no confidence in the inferred tree topologies.

67
Q

Explain Inability to incorporate evolutionary models as NJ con?

A

It doesn’t consider ¹substitution rate heterogeneity among sites, ²potential biased estimates of branch lengths & ³evolutionary relationships.

68
Q

Explain Limited accuracy as NJ con?

A

Seen in datasets with complex evolutionary patterns/high levels of sequence divergence.

69
Q

Explain Dependence on distance metrics as NJ con?

A

Dependence on distance metrics is unreliable as these vary depending on the biological system under study.

70
Q

MP attributes? (6)

A

• Character-based approach.
• Based on optimality criterion of Parsimony (minimum tree length).
• Branch lengths represent the no. of mutations.
• Unique mutations are not informative.
• Prone to long branch attraction.
• Only synapomorphies are used for Parsimony information (relationships become important).

71
Q

Steps involved in MP? (2)

A

• Searches for the tree topology with the lowest parsimony score (unrooted).

• Optimizes character states across taxa to construct the phylogenetic tree.

72
Q

Thing to note on Steps involved in MP?

A

Wants the shortest route possible.

73
Q

MP assumption?

A

We have evidence of every mutation.

74
Q

Long branch attraction attributes? (2)

A

• Homoplasy on long branches looks like shared mutations.
• The more data you collect, the more the wrong tree.

75
Q

MP pros? (5)

A

• Intuitive interpretation.

• Robustness to model assumptions.

• Applicability to diverse data types.

• Computationally efficient for small-medium sized datasets.

• Ease of interpretation.

76
Q

Explain Intuitive interpretation as MP pro?

A

It’s straightforward & aims to reconstruct the tree topology with the fewest evolutionary change.

77
Q

Explain Robustness to model assumptions as MP pro?

A

It’s less sensitive to model misspecification as there are no complex evolutionary models.

78
Q

Explain Applicability to diverse data types as MP pro?

A

It can analyze different types of data.

79
Q

Explain Ease of interpretation as MP pro?

A

Accessible to research using computational resources/expertise.

80
Q

MP Cons? (5)

A

• Sensitivity to Homoplasy.

• Potentially suboptimal solutions.

• Inability to incorporate evolutionary rates.

• Limited statistical support.

• Not suitable for evolutionary testing.

81
Q

Explain Sensitivity to homoplasy as MP con?

A

It assumes that similar character states are due to shared ancestry.

82
Q

Explain Potentially suboptimal solutions as MP con?

A

Converge due to the heuristic nature of the search process, especially for large or complex datasets.

83
Q

Explain Inability to incorporate evolutionary rates as MP con?

A

It doesn’t consider the ¹rate of evolution/variation among sites, ²potential for biased estimates of branch lengths & ³divergence times.

84
Q

Explain Limited statistical support as MP con?

A

It’s challenging to assess the robustness of the inferred phylogeny.

85
Q

Explain Not suitable for evolutionary testing as MP con?

A

It’s due to the reliance on parsimony score.

86
Q

Differences between NJ & MP? (2)

A

• Branch lengths.
• Node support.

87
Q

NJ vs MP in terms of node support?

A

● NJ
= bootstrap values indicating the proportion of times a particular clade is recovered in bootstrap replicates.

● MP
= assess support for nodes based on alternative analyses or additional statistical tests.

88
Q

NJ node support?

A

Bootstrap values indicate the proportion of times a particular clade is recovered in bootstrap replicates.

89
Q

MP Node support?

A

Assesses support for nodes based on alternative analyses or additional statistical tests.

90
Q

NJ vs MP in terms of branch lengths?

A

● NJ
= represents distances, not specific mutations.

● MP
= represents the no. of mutations.

91
Q

Which is better NJ or MP?

A

It depends on your research objectives.

92
Q

Why does choosing NJ or MP depend on my research objectives? (4)

A

You have to consider:

• Underlying principles & model assumptions.
• Data characteristics.
• Computational requirements.
• Trade-offs between accuracy & efficiency.

93
Q

NJ vs MP in terms of the underlying principles & model assumptions?

A

● NJ
= distance based.

● MP
= character based.

94
Q

NJ vs MP in terms of computational requirements?

A

● NJ
= computationally efficient.

● MP
= computationally intensive.

95
Q

Why is MP computationally intensive?

A

It’s because it involves complex heuristic search strategies.

96
Q

Model evolution?

A

= estimates of the relative probability of substitutions.

97
Q

What do the estimates of model evolution need information on? (4)

A

• Relative proportion of nucleotides.

• Relative frequency of transitions & transversions.

• Frequency of invariant sites.

• Differences in mutation rates between sites.

98
Q

ML & BI attributes? (6)

A

• Incorporate model selection.
• Optimize models while constructing the trees.
• Operate based on likelihood & probability.
• Incorporate complex substitution models.
• Not sensitive to long branch attraction.
• Always the best to use.

99
Q

Phylogenetic Inference = …?

A

A hypothesis no matter how complex it is, it still remains a hypothesis.

100
Q

NJ use “criteria”? (2)

A

• Distance related.
• If you can account for long branch attraction.

101
Q

ML node support attributes? (2)

A

• Value tells you how closely related taxa are (sister taxa).
• Uses bootstrap method (value out of 100).

102
Q

BI node support attributes? (4)

A

• Probabilities.
• Value of 1 is great.
• 0.9-0.95 is trustworthy.
• <0.9 is unresolved.

103
Q

BI node support of < 0.9?

A

Unresolved. You cannot trust the relationship between species.

104
Q

BI node support of 0.9-0.95?

A

You can trust the relationship between species.

105
Q

ML & BI in terms of branch length?

A

Branch length represents the properties of substitutions.

106
Q

Likelihood in ML?

A

= used to evaluate different trees.

107
Q

Probability?

A
108
Q

NJ uses which model?

A

Jukes-Cantor (JC) model (simple substitution level).

109
Q

Criteria for estimating the relative probabilities? (4)

A

Must have information on:

• Relative proportion of nucleotides (A:C:G:T).

• Relative frequency of transitions & transversions.

• Frequency of invariant sites.

• Differences in mutation rates between sites can vary (model dependent).

110
Q

What make ML & BI special? (7)

A

• Probabilistic methods.

• Maximize the likelihood of observed sequence data under specified substitution model.

• Substitution models are more complex & flexible.

• Directly optimize model parameter during tree construction.

• Can handle wide range of evolutionary scenarios.

• Preferred for accurate model parameter estimation.

• Computationally intensive, especially for large datasets & time-consuming.

111
Q

ML “method”/equations? (3)

A

● You’d like to know:

P (tree|data)

  • i.e., what is the likelihood of tree give the data.

● To do this, you need to consider ALL possible trees

= not feasible for >10 taxa.

● So, we calculate:

P (data|tree)

  • I.e., probability of this data given a specific tree (which tree suits the data).
112
Q

ML equations to note? (2)

A

● P (tree|data)

● P (data|tree)

113
Q

ML attributes? (5)

A

• Powerful method used extensively in statistics.

• Prefers hypotheses (tree) with the highest probability given the observed data.

• Very computationally intensive for phylogenies.

• Corrects multiple hits & removes the danger of long branch attraction.

• Accurately reconstructs relationships in diverged groups or groups evolving rapidly.

114
Q

What must you be given for ML to produce the preferred tree? (2)

A

• Dataset (an alignment).
• A model of character evolution.

115
Q

What is the preferred tree in ML?

A

= the tree that has the highest probability of having generated the observed data.

116
Q

Probability of data given the tree definition?

A

= when the best tree is the one that maximizes the likelihood of the data given the ¹tree topology, a ²set of branch lengths & an ³evolution model.

117
Q

BI attributes? (4)

A

• Probability of the tree given the data.

• Estimates trees & obtains measures of uncertainty for each branch.

• Optimal hypothesis maximizes the posterior probability (by measuring all its uncertainties).

• Posterior probability for a hypothesis is proportional to likelihood multiplied by the prior probability of that hypothesis.

118
Q

Posterior probability?

A

= the end goal.

119
Q

Prior probability?

A

= a scientist’s beliefs before having seen the data.

120
Q

Eg of prior probability?

A

Roll of dice.

121
Q

How does ML optimize parameters?

A

Optimizes parameters using numerical optimization algorithm.

122
Q

How does BI optimize parameters?

A

Optimizes parameters using MCMC sampling.

123
Q

Bayesian approach attributes? (9)

A

• Allows complex models of sequence evolution to be implemented.

• Doesn’t need bootstrapping to assess confidence in the nodes.

• Reports on posterior probabilities for branches.

• Feed it lots of prior information to get posterior probability.

• Specifies a model & prior distribution.

• Integrates the product of these qualities over all possible parameter values to determine posterior probability for each tree.

• Relies on MCMC to approximate probability distribution.

• Chain is constructed that moves through different trees & evolution models.

• Estimates probability that any particular tree is the true evolutionary tree for the observed data.

124
Q

MCMC stands for?

A

Markov Chain Monte Carlo.

125
Q

BI equations? (3)

A

● Eqn 1

P [tree|data] = P [tree & data] / P [data]

● Eqn 2

P [tree|data] = P [tree] × P [data|tree]

● Eqn 3
• obtained by substituting Eqn 2 into Eqn 1.

P [tree|data] = ( P [tree] × P [data|tree] ) / P [data]

126
Q

P [tree|data] ?

A

= probability of tree is true given the data.

127
Q

P [tree & data] ?

A

= joint probability of the particular tree & data (alignment).

128
Q

Joint probability equation?

A

P [tree|data] = P [tree] × P [data|tree]

129
Q

Joint probability?

A

= the product of the probability of the tree & conditional probability of the data given that tree.

130
Q

Bayes theorem equation simply?

A

Bayes theorem = posterior probability × prior probability.

131
Q

Joint probability equation simply?

A

Joint probability = posterior probability × prior probability = Bayes theorem

132
Q

P [tree] ?

A

= the posterior probability.

133
Q

P [data|tree] , i.e., conditional probability?

A

= prior probability.