Test 1 Flashcards

1
Q

What are the major approaches for studying adaption?

A
  1. Controlled experimental: often short time span.
  2. Comparative analysis: natural (not controlled experiments)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Experiment

A

An act or an operation carried out under controlled conditions in order to test, establish, or illustrate some known or suggested fact.
- Usually direct manipulation.
- used to test inferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Natural Experiment

A

Comparison of systems that occur naturally but differ in one or a few parameters of interest, with other variables mostly equivalent
- does not involve direct manipulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inference

A

The act of proceeding logically from one or more premises or observations considered to be accurate to another premise whose truth is believed to follow from that of the former.
- involve induction or deduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Deduction

A

inference on which the conclusion follows necessarily from the premises
- Proceeds from general knowledge to specific conclusion
- so long as the premises are true, the deduced conclusion will necessarily be true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Induction

A

Inference about general conditions from specific observations.
- Conclusions drawn from induction is not necessarily true based on logic alone and must be tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hypothesis

A

A tentative statement about the natural world leading to deductions that can be tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prediction

A

A specific claim about what will be observed if a particular hypothesis is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hypothetico-deductive method

A

An approach involving the generation of explicit predictions that can be tested by making new observations
- Common method in evolutionary biology and other sciences in which experimentation may be difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Experimental Science

A
  • Uses deduction to form testable hypothesis
  • Tests hypotheses by direct experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discovery Science

A
  • Not trying to test a specific hypothesis. Often data first
  • Based on discovering things about the world
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Theoretical Science

A
  • Uses deduction from mathematics to draw conclusions about the world
  • Can be tested by making predictions about future observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Historical Science

A

Deals with unique past events that cannot be observed directly or replicated exactly
- Relies on hypothetico-deductive method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Micro-evolution

A

Refers to changes in allele frequencies that occur over (short) time within a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Macro-evoution

A

Refers to changes that occurs at or above the level of species, usually over long time (millions of years)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an evolutionary tree?

A

A diagram showing the series of branchings and relatedness among species over evolutionary time.
- phylogeny

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is phylogenetics

A

The reconstruction and analysis of evolutionary trees, is a distinct discipline within evolutionary biology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Topology =

A

Branching Pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Internal nodes =

A

hypothetical ancestors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cladogram

A

Topology only, branch lengths have no meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Phylogram

A

Branch lengths signify the amount of divergence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Species that all go back to a shared ancestor have been evolving for the same amount of time: True or False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Any reconstructed tree is a ____ anout the relationships and patterns of barnching

A

Hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the general workflow of phylogeny reconstruction?

A
  1. Data collection and analysis
  2. Choose a method (parsimony, likelihood, etc…)
  3. Searching for the best tree(s) (=hypothesis)
  4. Evaluating/Interpreting the tree(s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Guideline for Character Selection (3)
1. Inheritability 2. Independence 3. Homology
26
What are the phylogeny reconstruction major approaches (3)
1. Parsimony-based methods (Cladistics) 2. Model-based methods (maximum likelihood, Bayesian) 3. Distance-based methods (phenetics)
27
Phenetics (Phylogeny Reconstruciton)
- Groupings based on overall similarity - Commonly used for genetic data - Analysis can be competed rapidly - Based on distance data
28
Cladistics (phylogeny Reconstruction)
- Does not group all information together into a singe measure of similarity - Only shared derived characters (synapomorphies) are informative to infer relationships - Unique, unshared characters (autopomorphy) and shared ancestral characters (symplesiomorphy) are uninformative and are ignored.
29
Apomorphy
Derived character state
30
Plesiomorphy
Ancestral character state
31
Symplesiomorphy
Character state shared by taxa, inherited unchanged from common ancestor
32
Synapomorphy
Character state shared by taxa and derived rather than representing the ancestral state.
33
Homoplasy
Similar characters that evolved independently in different taxa (= convergent evolution)
34
Monophyletic Group = natural clade
A group composed of a collection of organisms, including the most recent common ancestor of all those organisms and all the descendants of that most recent common ancestor
35
Paraphyletic Group
A group composed of a collection of organisms, including the most recent common ancestor of all those organisms, but omits some of the descendants of the most recent common ancestor - Some descendants appear very different
36
Polyphyletic Group
A group composed of a collection of organisms in which the most recent common ancestor of all the included organisms is not included, usually because the common ancestor lacks the characteristics of the group. - puts distant relatives together to the exclusion of closer relatives
37
OTU: Operational Taxonomic Unit
Ex. species, tip of the tree
38
Outgroup Analysis
For a given character with two or more states within the ingroup, the state occurring in the outgroup is assumed to be the plesiomorphic (ancestral) state.
39
What are the 2 rules for outgroup selection?
1. Looking for the closest relative as first choice of outgroup - sister group 2. Always choose multiple outgroups (outgroups are also evolving, so need to compare more than one; some members of your outgroup could be part of the ingroup)
40
Principle of Parsimony
- When there are conflicting evidences, the hypothesis supported by the maximum number of characters is preferable or - The tree that requires the minimum number of hypothetical evolutionary changes to explain the data is the preferred tree
41
Occam's Razor
The simplest explanation is preferred
42
Exhaustive Search
Look at all possible trees, choose the shortest ones. Good for small number of taxa
43
Heuristic Search
Look at a manageable subset of all trees. Often uses a method for generating an initial tree and examine other trees as compared to this tree. Repeated many times to converge to converge on MP trees.
44
Newick Format
Phylogenetic trees represented in a linear form with a series of parentheses enclosing names and separated by commas.
45
Unordered Characters
Any character state can change to the other. Ex. DNA sequences
46
Ordered Characters
Characters with only 2 sates are automatically ordered - Specific order with transitions only allowed between adjacent states
47
Model-based Methods (Maximum Likelihood)
Incorporate models of molecular evolution in an attempt to account for unequal probabilities of different changes in evolutionary history and recover the "hidden" changes
48
The principal of likelihood
Maximum likelihood methods in phylogenetics evaluate a hypothesis (tree) in terms of the proposed evolutionary model and phylogenetic tree would give rise to the observed data set - Requires; data, a model, and a tree Goal Finding the topology and branch lengths of the tree. that will give us the greatest probability of observing the DNA sequences in our data
49
Challenges of maximum likelihood
1. Unknown ancestral sequences 2. Many parameters to estimate
50
The Principal of likelihood Method
Like parsimony, likelihood is a character-based (or discrete) method which carries out calculations on individual residues (nucleotides) of the sequences - This means that it is slow and computationally intensive
51
Substitution model parameters for phylogenetic hypothesis
1. Relative rate parameters 2. Base (nucleotide parameters 3. Parameters for rate heterogeneity
52
Tree parameters for phylogenetic hypothesis
1. topology 2. branch lengths
53
DNA sequence data is... (4)
1. Readily available 2. Many characters (genes to genomes) 3. Relatively easy to model 4. Can be modeled to retrieve time of divergence
54
DNA Substitution -Transitions
Interchanges of two-ring purines (A G), or of one-ring pyrimidines
55
DNA Substitution - Transversions
Interchanges of purine for pyrimidine, or vise versa
56
Protein coding genes - Synonymous
Silent, no change in mutation
57
Protein coding genes - Non-synonymous (missense)
Replacement of amino acid
58
Protein coding genes - Non-synonymous (nonsense)
change in amino acid creating a stop codon
59
Sequence Substitution Models
Describes in probabilistic terms the process by which a nucleotide changes into another nucleotide over time.
60
Jukes-Cantor - Substitution Model (JC)
- Simplest of the substitution models Assumes: 1. Only one substitution rate for all bases 2. Equal base composition
61
Kimura's 2 Parameter Model - Substitution Model (K2P)
Assumes 1. Two substitution rates: transition is not equal to transversion 2. Equal base composition - Estimates the proportion of transitions and transversions
62
Hasegawa-Kishino-Yano - Substitution model (HKY85)
Estimates the transition and transversion ratio and allows for unequal base substitution
63
General-Time-Reversible - Substitution Model
Each substitution type in the matrix is allowed to have its own rate
64
2 ways to accommodate rate heterogeneity across sites
1. Proportion invariant sites 2. Site-specific Rates and Gamma Disribution
65
Proportion Invarient Sites
- Takes into account sites that can never vary - Assumes characters fall into two categories: 1. Not variable 2. Variable - all sites are assumed to be evolving at the same rate - A correction for the proportion of sites which are unable to change
66
Gamma Distribution
A correction for variable rates across sites. By adjusting the shape parameter, gamma distribution accommodates for varying degrees of rate variability
67
More complex models provide better fit with observed data, however...
- More parameters introduce larger variance - The statistical uncertainty about each parameter increase, because all parameters are estimated from the observed data and the amount of data remains constant
68
Model selection is a trade-off between ____ and ___. Therefore chosen model should...
Model selection is a trade-off between accuracy and complexity - therefore chosen model should have enough parameters to adequately explain the data - but no more
69
What is the problem with estimating genetic difference
Pairwise difference can be saturated over time, DNA substitution models were designed to reveal the “hidden” variation due to multiple changes at each site
70
Indel =
Insertions or deletions
71
How do you control the introduction of indels
Use gap penalty
72
Gap penalties
1 gap = x mismatches high for adding a gap (open gap penalty) Low for extending gap (gap extension penalty)
73
Multiple Alignments - Clustal
1. Calculate all pairwise similarity scores 2. Create similarity matrix and use algorithm to cluster sequences 3. Create an alignment of clusters via a consensus method 4. Create progressive multiple alignment by sequentially aligning groups of sequences, according to their branching order in clustering
74
What are 3 things that may help you align your sequences
1. Tree - simultaneous estimate of alignment and tree - very slow. Many programs using some kinds of tree during the process of alignment 2. Structure - secondary (loops, compensatory changes), intron-exon boundaries 3. Proteins and genetic code
75
Are there models for gaps
No - there is no current reliable model for gaps - operationally, gaps are often treated as unknown
76
Consistency Index
Measures how well an individual character fits on a phylogenetic tree - only applicable to parsimony analysis and most frequently reported when using morphological data
77
CI of a tree =
Average of all characters
78
Bootstrapping - Nodal Evaluation
Use the observed sample to estimate the population distribution - With the pseudo data you then find the best trees - Examine the appearance frequency of a particular node - The frequencies = bootstrap proportions - The higher the better
79
What bootstrap values are reliable?
Above 70% 90%/95% is preferred
80
Comparing alternative topologies - Templeton's test
- nonparametric test - Use Wilcoxon signed rank test of relative number of steps required by each character on each respective trees - Only applicable for parsimony
81
2 ways to measure differences between trees
1. Total difference in steps 2. The distribution of different steps in many different characters or concentrated on a few
82
Kishino and Haseawa test
- Extends the Templeton test to likelihood method - parametric - Used to compare 2 trees - Examines the variance in site likelihood differences for the trees being compared
83
Shimodaira and Haseawa test (SH test)
- Designed for multiple tests as a correction of KH test - Applied to the problem of comparing a priori specified hypotheses to the ML tree - test prone to type II error (false negative - too conservative)
84
Congruence Studies
Seek out common phylogenetic patterns in multiple, independent data sets
85
What are the three consensus techniques
1. Strict consensus 2. 50% Majority consensus 3. Adam's consensus
86
What should you always do to evaluate nodal support
Bootstrap
87
When comparing alternative topology, which test should you use for parsimony?
Templeton's test
88
When comparing alternative topology, which test should you use for likelihood
SH
89
When you have multiple equally parsimonious trees you may...
report a consensus tree
90
Bayesian Method
Statistical inference methodology. - Main feature is to use probability distributions (from genomic sequence data) to describe the uncertainty of all unknowns, including the model parameters.
91
Bayesian Inferences are based upon....
Posterior probability of a hypothesis
92
Bayes' formula calculates...
posterior probability
93
On a posterior probability density curve, the top of the curve represents...
Maximum likelihood estimate
94
On a posterior probability density curve, the area under the curve represents
Posterior probability
95
The more parameters you add to Bayes formula the ___ it is to compute
Harder
96
In equations the tree is always the...
Hypothesis
97
Markov Chin Monte Carlo (MCMC) - Steps of the algorithm (MHG)
1. New tree is proposed by stochastically perturbing the current tree 2. The acceptance ratio is calculated 3. A number between 0 and 1 is randomly chosen 4. If the acceptance ratio is greater than the random number, the new tree is accepted, and is then subjected to further perturbation 5. Repeat thousands or millions of times
98
What is burn-in (MCMC)
- Where the robot starts impact the estimates - It takes time for the robot to eliminate impacts - This is repeated multiple times as there is variation
99
What is a potential problem with MCMC and what is the solution
"poor mxing" - the chain spends long periods of time stuck in one place Solution: run more than one chain simultaneously
100
The proportion of time that is spent visiting one particular tree in MCMC is...
A valid approximation to the posterior probability
101
What are the advantages of MCMC (5)
1. Can accomodate uncertainty in phylogenetic reconstruction 2. Intuitive measure of support 3. Can employ complex models of substitution 4. Integrate over "nuisance" parameters 5. Computationally efficient
102
What makes a Characteristic Parsimony Informative
If there at least two variations of the character that occur in at least two taxa.