Lecture 3 - RH Flashcards
Why do we perform alignments?
To find homologues
To see if homologue is associated with protein structure
To determine function
To determine evolutionary relationships
How much information can be obtained from aa sequence, pair of homologues, and many aligned sequences?
aa sequence = very little
pair of homologues = some info
many alignments = a lot of information
Why are MSAs performed?
To elucidate functional info within protein sequence
To perform evolutionary analysis
In a pairwise alignment what does the positioning of 2 aa’s at the same point imply?
That they have the same role in homologous proteins
What happens when more sequences are added to an alignment?
More accurate results revealing results that are not obvious in a pairwise alignment
How is a MSA performed?
Find sequences you wish to align
Prune them if necessary
Run multi-alignment algorithm
Inspect the output
Remove disruptive sequences and repeat
Identify conserved aa’s
How are alignments scored?
Alignment arranged so a maximum number of characters in each sequence are matched
Scoring is done according to the Sum of Pairs (SP)
Each column is scored by summing all possible matches, mismatches, and gaps.
What does the sum of pairs result indicate?
2^n where n is the final score of the Sum of Pairs. This is the number that represents the number of times that a sequence is identical due to homology rather than pure chance.
What can MSAs tell us?
Most highly conserved residues may correspond to the active site.
Insertions and deletions are probably in surface loops
Conserved pattern of hydrophobicity with spacing 2 may indicate beta sheet
Spacing 4 may indicate alpha helix
What are some ways to construct phylogenetic trees?
Distance matrix methods
Maximum parsimony methods
Maximum likelihood methods
What is Neighbour Joining?
Similar to UPGMA using stepwise build
Corrects for evolutionary rate
Creates an unrooted tree
What is character based maximum parsimony? What is the problem with this method?
Based on sequence characters rather than distances
Trees are constructed by searching all possible tree topologies and looking for one with the least changes
Problem: Computationally expensive and so not all sites are used
What is maximum likely based on?
Searches for the evolutionary model that has the highest likelyhood of producing the observed data
Uses a substitution model that incorporates probability.
In practice every position in alignment is scored based on probability.
What is bootstrapping and what is it used for?
A method of statistically validating a tree.
Data is resampled (generally 1000 times) after being slightly
Statistics are hard to define if a node is present 700 times from 1000 then that means 95% probability that it is in the correct position.
*Low bootstrap numbers are bad news