Introduction to sequence analysis Flashcards

1
Q

What is a global sequence alignment?

A
  • sequence comparison along the entire length of the two sequences being aligned
  • best for high-similar sequences of similar length
  • as the degree of sequence similarity declines, global alignment methods tend to miss important biological relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Character based methods for building a phyologenic tree

A
  • ML (maximum likelyhood)
  • MP (maximum parsimony)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the terms homology and homologs

A

Homlogoly - The presence of a similar feature because of descent from common ancestor (defines evolutionary relationships)

Homologs - Genes either are or are not homologous (not measured in degrees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 widely used MSA programs

A
  • Claustal-W
  • T-COFFEE
  • MAFFT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we perform sequence analysis?

A
  • discover function
  • study evolution
  • find crucial features
  • identify cause of disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a speciation event?

A

Speciation is a lineage-splitting event that produces two or more separate species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the p-value of an alignment mean?

A

It tells us about the probability that we get an alignment with this score by chance. Should be close to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a taxon?

A

A set or group of organisms, most often species, at the end of a branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is taken into consideration when scoring two aligned sequences?

A
  • The kind of AA
  • the chemical properties of the AAs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we do multiple alignments?

A
  • to identify conserved regions, patterns, and domains
  • to identify new members of protein families
  • to predict structure and function of new protein sequences
  • as a preliminary step in molecular evolution analysis using phylogenetic methods for constructing phylogenetic trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a clade?

A

A group of organisms that includes an ancestor and all descendants of that ancestor, irrespective of how closely they may or may not resemble one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of alignment does BLAST perform?

A

A local sequence alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Distance based methods for building a phylogenic tree

A
  • UPGMA (unweighted pair group method with arithmetic mean)
  • NJ (neighbor joining)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

I want to compare sequences of different lenghts, which alignment should I use?

A

Local sequence alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

My two sequences are really similar and also have about the same length. Which alignment should I use?

A

Global sequence alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the E-value of an alignment tell us?

A

It tells us how many times (or how many sequences) we expect such an alignment with this score by chance.

14
Q

What is a local sequence alignment?

A
  • sequence comparisons intended to find the most similar regions in the two sequences being alligned
  • regions outside the area of local alignment are excluded
  • more than one local alignment could be generated for any two sequences being compared
  • best for sequences that share some similarity, or for sequences of different lengths
15
Q

What does “character-based methods” of phyologenic tree building mean?

A

Use the aligned characters, such as DNA or protein sequences, directly during tree inference

17
Q

Definition of homology

A

the presence of a similar feature because of descent from a common ancestor (defines evolutionary relationships)

18
Q

Definition of orthologs

A

Homologs in different species that perform the same function most likely have the same domain and 3D structure can be used to predict gene function in novel genes

20
Q

Which are the two major scoring systems used for proteins?

A
  • PAM/Dayhoff
  • BLOSUM series (Blocks Substitution Matrix)
21
Q

What does “distance-based methods” of phyologenic tree building mean?

A

Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building

23
Q

Definition of paralogs

A

Homologs in the same species that most likely have different functions “homologs that diverged after gene duplication” provides insight into “evolutionary innovation” Gene A duplicates to gene A’ –> no evolutionary pressure on A’ because there is already a gene performing the task, so it can take on new functions

24
Q

Who do you perform phyolgeny at the end of a MSA?

A

A phyologenic tree is used to help represent evolutionary relationships between genes, proteins, and also organisms that are believed to have some common ancestry

25
Q

What does BLAST stand for?

A

Basic Local Alignment Search Tool

26
Q

State “true” or “fals” for each alternative

a) human and mouse histamine 1R are orthologs
b) Human HRH1 and Human HRH2 are paralogs
c) Orthologs and paralogs are homologs

A

All are true

27
Q

What is a root?

A

A basal node

28
Q

What is a node?

A

A common ancestor / the point at which branches connect

29
Q

What are branches?

A

Lines within the tree

30
Q

What is a cluster?

A

A cluster is a group of things placed togehter on the basis of their resemblence to one another, irrespective of their evolutionary relationship