Chapter 5-Bioinformatics Flashcards by Tang Yun

Explain the molecular clock hypothesis

Mutations accumulate randomly over time and there is a relatively constant rate of mutation of N base pairs per year
Most of the mutations are neutral are natural selection would neither favour or disfavour them
When an individual has progeny, these mutations are passed on to the next generation.
Genetic difference between any two species is proportional to the time since these species last shared a common ancestor

How well did you know this?

Not at all

Perfectly

What is the mutation rate in humans?

0.5-1 mutations/ 1 gigabasepair in one generation

How well did you know this?

Not at all

Perfectly

Define homologues

Homologues are sequences that share a common evolutionary history (originated from a common ancestor)

How well did you know this?

Not at all

Perfectly

Define orthologues

Sequences in different species that arose from a common ancestral gene during speciation and are responsible for the same function

How well did you know this?

Not at all

Perfectly

Define paralogs

Homologous sequences within a single species that arose by gene duplication

How well did you know this?

Not at all

Perfectly

Define sequence alignment

Process of lining up 2 or more sequences to achieve maximal levels of identity for the purpose of assessing the degree of similarity and the possibility of homology

How well did you know this?

Not at all

Perfectly

Functions of pairwise sequence alignment

Identify common and differing aa/nucleotides in equal positions
Identify domains or motifs shared between proteins
Evaluate if two proteins or genes have a similar sequence

How well did you know this?

Not at all

Perfectly

Difference between pairwise sequence and multiple sequence

Pairwise: compares 2 sequences
Multiple: compares 3 or more sequences

How well did you know this?

Not at all

Perfectly

What is local sequence alignment

optimal similarity scores of 2 sequences determined over numerous subregions along the length of the 2 sequences
useful in identifying protein domains

How well did you know this?

Not at all

Perfectly

What is global sequence alignment

optimal similarity score is determined over the entire length of the 2 sequences
useful in assessing whether genes or proteins are homologous

How well did you know this?

Not at all

Perfectly

How to calculate alignment score

Alignment score= match scores + gap penalties + mismatch scores

-gap penalties and mismatch scores have negative values

How well did you know this?

Not at all

Perfectly

What is Needleman-Wunsch alignment?

Global sequence alignment

How well did you know this?

Not at all

Perfectly

What is the Smith-Waterman alignment?

Local alignment

How well did you know this?

Not at all

Perfectly

What do the values assigned to PAM mean?

refers to the number of aa substitutions per 100 aa

How well did you know this?

Not at all

Perfectly

PAM family extrapolation formla

PAM-n=(PAM-1)^n

How well did you know this?

Not at all

Perfectly

PAM vs BLOSUM

Study These Flashcards

PAM matrices are based on global alignments of closely related proteins while BLOSUM matrices are based on local alignments
All PAM matrices are extrapolated from PAM-1 while all BLOSUM matrices are based on observed alignments

Arrange the following in increasing order of divergence

BLOSUM-62, BLOSUM-45, BLOSUM-80

Study These Flashcards

BLOSUM 80, BLOSUM 62, BLOSUM 45

How should you choose the appropriate aa substitution matrix?

Study These Flashcards

expected degree of sequence divergence

Which algorithms are exact?

Study These Flashcards

Needleman-Wunsch and Smith-Waterman

-optimal but time consuming to compute

Which alignment algorithms are heuristic methods?

Study These Flashcards

Pairwise: BLAST
Multiple alignment: all widely used multiple alignment programmes e.g. MAFFT
- based on common sense and assumptions
- not optimal, but fast

Function of BLAST

Study These Flashcards

a heuristic local alignment programme
allows scientists to compare new sequences with databases containing many characterised genes
results can provide valuable functional and evolutionary info

How to interpret the E-vale in BLAST

Study These Flashcards

statistical interpretation of how likely it is to get the alignment score by chance
smaller E indicates a more significant alignment
E<0.02 seq is probably homologous
0.021 this match is probably by chance

Importance of multiple sequence alignment

Study These Flashcards

Links proteins at the aa level, making it possible to identify conserved features, predict functionally impt residues and identify locations which affect the biochemical properties of the protein
Basis for phylogenetic tree construction
Allows generalisation of sequences to profiles

Uses of phylogeny

Study These Flashcards

Identifying orthologs and paralogs in gene families
Discover population history and species history
Estimate divergence times assuming molecular clock

Problem with multiple sequence alignment and how it can be overcome

- high dimensionality of solution space—> makes optimal solution hard to calculate with dynamic programming - solution: use iterative algorithms

Steps in CLUSTAL

1. Calculate pairwise alignments for every sequence pair and calculate sequence distances 2. Using distances, estimate a guide tree using neighbour joining 3. Using the guide tree, align the sequences in that order

Steps in neighbour-joining

1. Find 2 nodes with minimal relative distance from each other compared to the distance to the others 2. Replace these 2 nodes with a common ancestral node 3. Compute all pairwise distances between the remaining node and the ancestral node 4. Repeat steps 1-3 until only 2 nodes remain and connect them with an edge

Types of multiple sequence alignments

Progressive and iterative

Chapter 5-Bioinformatics Flashcards

(28 cards)