Chapter 5-Bioinformatics Flashcards

1
Q

Explain the molecular clock hypothesis

A
  1. Mutations accumulate randomly over time and there is a relatively constant rate of mutation of N base pairs per year
  2. Most of the mutations are neutral are natural selection would neither favour or disfavour them
  3. When an individual has progeny, these mutations are passed on to the next generation.
  4. Genetic difference between any two species is proportional to the time since these species last shared a common ancestor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mutation rate in humans?

A

0.5-1 mutations/ 1 gigabasepair in one generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define homologues

A

Homologues are sequences that share a common evolutionary history (originated from a common ancestor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define orthologues

A

Sequences in different species that arose from a common ancestral gene during speciation and are responsible for the same function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define paralogs

A

Homologous sequences within a single species that arose by gene duplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define sequence alignment

A

Process of lining up 2 or more sequences to achieve maximal levels of identity for the purpose of assessing the degree of similarity and the possibility of homology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Functions of pairwise sequence alignment

A
  1. Identify common and differing aa/nucleotides in equal positions
  2. Identify domains or motifs shared between proteins
  3. Evaluate if two proteins or genes have a similar sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Difference between pairwise sequence and multiple sequence

A

Pairwise: compares 2 sequences
Multiple: compares 3 or more sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is local sequence alignment

A
  • optimal similarity scores of 2 sequences determined over numerous subregions along the length of the 2 sequences
  • useful in identifying protein domains
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is global sequence alignment

A
  • optimal similarity score is determined over the entire length of the 2 sequences
  • useful in assessing whether genes or proteins are homologous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate alignment score

A

Alignment score= match scores + gap penalties + mismatch scores

-gap penalties and mismatch scores have negative values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Needleman-Wunsch alignment?

A

Global sequence alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Smith-Waterman alignment?

A

Local alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do the values assigned to PAM mean?

A
  • refers to the number of aa substitutions per 100 aa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

PAM family extrapolation formla

A

PAM-n=(PAM-1)^n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PAM vs BLOSUM

A
  1. PAM matrices are based on global alignments of closely related proteins while BLOSUM matrices are based on local alignments
  2. All PAM matrices are extrapolated from PAM-1 while all BLOSUM matrices are based on observed alignments
17
Q

Arrange the following in increasing order of divergence

BLOSUM-62, BLOSUM-45, BLOSUM-80

A

BLOSUM 80, BLOSUM 62, BLOSUM 45

18
Q

How should you choose the appropriate aa substitution matrix?

A
  • expected degree of sequence divergence
19
Q

Which algorithms are exact?

A

Needleman-Wunsch and Smith-Waterman

-optimal but time consuming to compute

20
Q

Which alignment algorithms are heuristic methods?

A

Pairwise: BLAST
Multiple alignment: all widely used multiple alignment programmes e.g. MAFFT
- based on common sense and assumptions
- not optimal, but fast

21
Q

Function of BLAST

A
  • a heuristic local alignment programme
  • allows scientists to compare new sequences with databases containing many characterised genes
  • results can provide valuable functional and evolutionary info
22
Q

How to interpret the E-vale in BLAST

A
  • statistical interpretation of how likely it is to get the alignment score by chance
  • smaller E indicates a more significant alignment
  • E<0.02 seq is probably homologous
  • 0.021 this match is probably by chance
23
Q

Importance of multiple sequence alignment

A
  1. Links proteins at the aa level, making it possible to identify conserved features, predict functionally impt residues and identify locations which affect the biochemical properties of the protein
  2. Basis for phylogenetic tree construction
  3. Allows generalisation of sequences to profiles
24
Q

Uses of phylogeny

A
  1. Identifying orthologs and paralogs in gene families
  2. Discover population history and species history
  3. Estimate divergence times assuming molecular clock
25
Q

Problem with multiple sequence alignment and how it can be overcome

A
  • high dimensionality of solution space—> makes optimal solution hard to calculate with dynamic programming
  • solution: use iterative algorithms
26
Q

Steps in CLUSTAL

A
  1. Calculate pairwise alignments for every sequence pair and calculate sequence distances
  2. Using distances, estimate a guide tree using neighbour joining
  3. Using the guide tree, align the sequences in that order
27
Q

Steps in neighbour-joining

A
  1. Find 2 nodes with minimal relative distance from each other compared to the distance to the others
  2. Replace these 2 nodes with a common ancestral node
  3. Compute all pairwise distances between the remaining node and the ancestral node
  4. Repeat steps 1-3 until only 2 nodes remain and connect them with an edge
28
Q

Types of multiple sequence alignments

A

Progressive and iterative