Lecture 12 - Phylogenetic Models Flashcards

Question 1

Q

is there a constant rate of mutation seen in all branches

Question 2

Q

are multiple changes seen at individual sites

Question 3

Q

what are possible factors in models of molecular evolution

Answer

A

different substitution preferences
different rates at different sequence positions
different rates on different branches of the tree

Question 4

Q

what are some measures of evolutionary distance

Answer

A

fractional alignment/p-distance
poisson distance

Question 5

Q

what is the calculation of fraction alignment/p-distance

Answer

A

p = D/L
- D is the number of observed changes
- L is the length of the sequence

Question 6

Q

what does Poisson distance account for

Answer

A

multiple substitutions at individual sites

Question 7

Q

what is the probability of one of two aligned positions changing [Poisson distance]

Answer

A

p = 1-e^{-2rt}

Question 8

Q

what is the calculation for the Poisson distance (d_p)

Answer

A

d_p = -ln(1-p)

Question 9

Q

what is the goal of nucleotide models

Answer

A

to effectively represent nucleotide changes within a set of sequences

Question 10

Q

what are the assumptions of the Jukes-Cantor Model

Answer

A

all sites are independent
rates of evolution are the same at all sites
all substitutions are equally likely, and occur at rate α

Question 11

Q

what is the chance of a nucleotide not changing in the Jukes-Cantor model

Question 12

Q

what does the Kimura Two-Parameter (K2P) model account for

Answer

A

different rates for transitions (α) and transversions (β)

Question 13

Q

which occur at a lower rate: transitions or transversions

Answer

A

transversions

Question 14

Q

what does the KHY85 model account for

Answer

A

corrects for the ratio of nucleotide composition

Question 15

Q

what does the Generalized Time-Reversible (GTR) model account for

Answer

A

nucleotide composition and different rates for all possible reversible transitions and transversions

Question 16

Q

what can be used to correct for different rates at different positions

Answer

A

the Gamma distribution

Question 17

Q

how are protein models commonly derived

Answer

A

using empirically derived substitution matrices

Question 18

Q

how does parameter number affect choosing a nucleotide model

Answer

A

too few -> inaccuracy, convergence upon the wrong tree
too many -> reduces statistical power, the ability to reject a hypothesis

Question 19

Q

what is overfitting

Answer

A

forcing too many parameters on data that has natural statistical variation

Question 20

Q

what are Modeltest and Prottest

Answer

A

algorithms that assess models

Question 21

Q

what is the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) used for

Answer

A

measure of quality used to assess models

Question 22

Q

how does AIC/BIC inform which model to choose

Answer

A

the model with the lowest AIC/BIC is selected