Introduction to sequence analysis Flashcards
What is a global sequence alignment?
- sequence comparison along the entire length of the two sequences being aligned
- best for high-similar sequences of similar length
- as the degree of sequence similarity declines, global alignment methods tend to miss important biological relationships
Character based methods for building a phyologenic tree
- ML (maximum likelyhood)
- MP (maximum parsimony)
Define the terms homology and homologs
Homlogoly - The presence of a similar feature because of descent from common ancestor (defines evolutionary relationships)
Homologs - Genes either are or are not homologous (not measured in degrees)
3 widely used MSA programs
- Claustal-W
- T-COFFEE
- MAFFT
Why do we perform sequence analysis?
- discover function
- study evolution
- find crucial features
- identify cause of disease
What is a speciation event?
Speciation is a lineage-splitting event that produces two or more separate species
What does the p-value of an alignment mean?
It tells us about the probability that we get an alignment with this score by chance. Should be close to zero
What is a taxon?
A set or group of organisms, most often species, at the end of a branch
What is taken into consideration when scoring two aligned sequences?
- The kind of AA
- the chemical properties of the AAs
Why do we do multiple alignments?
- to identify conserved regions, patterns, and domains
- to identify new members of protein families
- to predict structure and function of new protein sequences
- as a preliminary step in molecular evolution analysis using phylogenetic methods for constructing phylogenetic trees
What is a clade?
A group of organisms that includes an ancestor and all descendants of that ancestor, irrespective of how closely they may or may not resemble one another
What kind of alignment does BLAST perform?
A local sequence alignment
Distance based methods for building a phylogenic tree
- UPGMA (unweighted pair group method with arithmetic mean)
- NJ (neighbor joining)
I want to compare sequences of different lenghts, which alignment should I use?
Local sequence alignment
My two sequences are really similar and also have about the same length. Which alignment should I use?
Global sequence alignment
What does the E-value of an alignment tell us?
It tells us how many times (or how many sequences) we expect such an alignment with this score by chance.
What is a local sequence alignment?
- sequence comparisons intended to find the most similar regions in the two sequences being alligned
- regions outside the area of local alignment are excluded
- more than one local alignment could be generated for any two sequences being compared
- best for sequences that share some similarity, or for sequences of different lengths
What does “character-based methods” of phyologenic tree building mean?
Use the aligned characters, such as DNA or protein sequences, directly during tree inference
Definition of homology
the presence of a similar feature because of descent from a common ancestor (defines evolutionary relationships)
Definition of orthologs
Homologs in different species that perform the same function most likely have the same domain and 3D structure can be used to predict gene function in novel genes
Which are the two major scoring systems used for proteins?
- PAM/Dayhoff
- BLOSUM series (Blocks Substitution Matrix)
What does “distance-based methods” of phyologenic tree building mean?
Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building
Definition of paralogs
Homologs in the same species that most likely have different functions “homologs that diverged after gene duplication” provides insight into “evolutionary innovation” Gene A duplicates to gene A’ –> no evolutionary pressure on A’ because there is already a gene performing the task, so it can take on new functions
Who do you perform phyolgeny at the end of a MSA?
A phyologenic tree is used to help represent evolutionary relationships between genes, proteins, and also organisms that are believed to have some common ancestry
What does BLAST stand for?
Basic Local Alignment Search Tool
State “true” or “fals” for each alternative
a) human and mouse histamine 1R are orthologs
b) Human HRH1 and Human HRH2 are paralogs
c) Orthologs and paralogs are homologs
All are true
What is a root?
A basal node
What is a node?
A common ancestor / the point at which branches connect
What are branches?
Lines within the tree
What is a cluster?
A cluster is a group of things placed togehter on the basis of their resemblence to one another, irrespective of their evolutionary relationship