Lecture 3 DA Flashcards

1
Q

What is the purpose of an alignment (4)?

A
  • Find homologues
  • See if a homologue has an associated protein structure
  • Determine function
  • Determine evolutionary relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of multiple sequence alignments (2)?

A
  • Elucidate functional informarion within protein sequences

- Perform evolutionary analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a pairwise alignment, what does positioning of two amino acids at the same point imply? What is the best way to determine the implication, and can this always be done?

A

They perform the same role in homologous proteins.
Can be determined by performing structural alignment, where the amino acids are aligned in 3D. May not be possible if no 3D structure is available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can be done to increase the accuracy of pairwise alignments? What does this reveal? What alignment to perform is usually worthwhile?

A

Adding more sequences to an alignment. Can reveal patterns that aren’t obvious in a pairwise alignment. Worthwhile to perform MSA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe how an MSA is done (6).

A
  • Find sequence of interest (ie. BLAST).
  • Prune if necessary.
  • Run multiple alignment algorithm.
  • Inspect output.
  • Remove disruptive sequences, and repeat.
  • Identify key conserved amino acids.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is scoring done in MSA (3)?

A
  • Alignment is arranged so a maximum number of characters in each sequence is matched.
  • Scores are accorded to the sum of pairs.
  • Each column is scored by summing all possible matches, mismatches, and gaps.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some disadvantages of MSA (2)?

A
  • MSA is computationally expensive.

- Difficult for 4 sequences, more than 20 is impossible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the CLUSTAL algorithm work (4)?

A
  • Begin by pairwise alignment.
  • Build a phylogenetic guide tree.
  • Take most closely related sequences and align them, forming a consensus.
  • Repeat with the next most closely related sequence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an advantage and disadvantage of the CLUSTAL algorithm?

A

Advantage
- Results in near optimal alignment.
Disadvantage
- If an early error is made, it is preserved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a major problem with the CLUSTAL algorithm?

A

Selection of an appropriate matrix for alignments consists of divergent and closely related sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

From an MSA, what do highly conserved residues suggest?

A

Correspondence to an active site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

From an MSA, where are insertions and deletions often found?

A

In surface loops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

From an MSA, what do conserved patterns of hydrophobicity with a spacing of 2 indicate? What about 4?

A

B-sheet.

4 indicates a-helix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the terminal node?

A

End point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an internal node?

A

Hypothetical ancestor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a root?

A

Common ancestor.

17
Q

What are the three basic assumptions of cladistics?

A
  • Any group of organisms are related by descent from a common ancestor.
  • Bifurcating pattern of cladogenesis.
  • Change in characteristics occur in lineages over time.
18
Q

Can phylogenetic trees be rooted, or unrooted? What sequence is better to use? What can be done with this sequence, and what is it called?

A

They can be rooted, or unrooted.
It is better to use a sequence that is more divergent from all other sequences. The tree can be rooted at this sequence, called an outgroup.

19
Q

What is characteristic of fully resolved trees?

A

They are binary, with no more than 2 branches at each node.

20
Q

What is a problem with increases in taxa?

A

The number of possible trees increases exponentially, making it hard to know the one drawn is the true tree.

21
Q

What are three ways to build trees?

A
  • Distance matrix method
  • Maximum parsimony method
  • Maximum likelihood method
22
Q

Describe the distance matrix method.

A

It is a clustering method, for a set of species. You choose the two most similar, add a node, then add the next most similar.

23
Q

What is a disadvantage of the distance matrix method?

A

Very simplistic, makes assumptions that may not be true.

24
Q

Describe neighbour joining. What does it assume?

A

Stepwise build, doesnt asume all taxa have the same evolutionary rate. It can detect it, and corrects for this.

25
Q

What kind of tree does neighbour joining create?

A

It creates an unrooted tree.

If a rooted tree is needed, outgroup must be determined.

26
Q

What is the character based method based on?

A

Sequences rather than distances.

27
Q

How are trees constructed in character based methods?

A

By searching all tree topologies, looking for one with least changes.

28
Q

What is a disadvantage of character based methods?

A

Computationally expensive.

29
Q

What principle is the character based method based on?

A

Occam’s razor, the simplest explanation is the correct one.

30
Q

How does maximum likelihood work?

A

Searches for the evolutionary model that has the highest likelihood of producing the observed data.
Every position in the alignment is scored, then summed.

31
Q

What method does maximum likelihood use, and what is its disadvantage?

A

Uses a substitution method incorporating probability. Computationally expensive.

32
Q

What is bootstrapping?

A

Way of statistically validating the tree.

33
Q

How does bootstrapping work?

A

Data is resampled after being slightly perturbed (usually 1k times), and the number of times a node appears is given.

34
Q

What statistics given by bootstrapping give a 95% probability of a node’s correct position?

A

Statistics are hard to define.

If a node is present 700-1000 times, 95% probability it’s in the correct position.

35
Q

What are a-globin sequence analyses used for?

A

To estimate divergence time, a molecular clock.

36
Q

What is used as a calibration point in for divergence times?

A

Humans and cows split point at 80mya.

37
Q

What is assumed in divergence times? Is it true?

A

Linear relationship between time and mutation accumulation. not entirely true, but works for models, forms the molecular clock.