Lecture 2 Flashcards
When comparing two nucleotide sequences what do we need to keep in mind?
We have to keep in mind that they are result of mutation during replication (genotypic level) and selection (phenotypic level)
Could you give and example of how evolution occurs in genotypic and phenotypic level ?
At sequence level is GAC changes to AAC, this muting results in antibody not binding anymore and there fore it minds to HIV instead.
How many nucleotides are there for RNA and DNA?
DNA: TCAG, RNA: UCAG
How many amino acids are there?
20
What is a codon?
3 nucelotides which encode for one amino acid
To change the phenotype, at least how many nucleotides should be changed?
1
What could have to DNA/RNA when they are copied?
mutation, insertion, deletion, repeat, inversions, inverted repeated
To compare two sequences, what do we need to know about them initially?
which positions in the sequences correspond to each other.
A correct alignment represents — events such as — and — ( — and —)
actual, substitutions, indels, insertion and deletion
Sequences with shared ancestry are referred to as what?
homologous
to be able to align sequences we base the idea of what?
That there is a common ancestor which genes evolved from. The ancestor has a certain nucleotide or aminoaide at a certain position which could have changed during the evolutionary history.
Could we be sure of the alignment?
no but we choose the model with the highest probability or score
What are types of alignments?
pairwise : protein-protein, DNA/DNA, RNA/RNA, DNA or RNA with a protein : shifts within a codon
multiple sequence alignment
What are two strategies for alignments? explain each
Global: aligns one sequence to there other from start to end, local: finds the longest subsequences with highest similarity
what are strategies for finding a pairwise sequence alignment?
e.g different methods
qualitative method : dot-matrix method
exact method via dynamic programming: needle-man for global and smith for local,
heuristic and fast methods: word methods like BLAST
align CTG and CTAAG, CTAAGAAG and CTAAG, ATC and CTAAG using the dot matrix method and say what each pattern shows
slide 15
What are the pros and cons of dot matrix?
pro: visually easy to identify sequence features such as indels, repeated, inversions and inverted repeats, cons: time consuming and due to being qualitative doesnt give optimal alignment
In quantitative methods how do we know to accept for instance mutation or gap?
By assigning costs to different actions in the alignment process
What are 3 possibilities of characters being compared at one position?
match, mismatch, gap
What are the total number of alignments between a sequence of length m and sequence of length n<m? explain how you got to this
slide 21
What is dynamic programming?
its breaking down a more complicated problem to simpler sub problems and solve them in a recursive manner, and finding the optimal solution to the sub problems.
Explain how the smith waterman algorithm works? how many steps does it include?
if we have two sequences A and B, we calculate the optimal alignment to one point only once and we built am trip for the two sequences with seaA row and serb column. the field (I,j) corresponds to the score of the optimal alignment with the nucleotides ai and bj as the end of the alignment. Finally we find the best way through the complete matrix
it requires m*n steps
What are the pros and cons of the smith waterman algorithm?
its fast in comparison of brute force , it finds the optimal or one of multiple alignments with the same highest score, its only for local alignments, only pair wise alignment possible, still too slow for scanning against big libraries
align AATC and AGAC according to needle man wunsh
slide 31c
How does BLAST work?
it aligns the query sequence against all known sequences from genback which is a collection of DNA sequences and it reports the best alignment
Explain in detail how the BLAST algorithm works.
- split the query into sequences of length k
- search k-letter words in the database sequences, similar is ok but is scored. 3. only keep the sequences above the set threshold 3. expand the k-letter words to the right and left. 4. expand the k-letter words to the right and left 5. stop if the score drops below a certain threshold 6. keep only the pairwise alignments which are above the threshold 7. report the database sequences (doesnt have to be exact sequences could be similar)
What are the pros and cons of blast?
up to 50-100 times faster than direct alignment, allows searches for exact matches but also similarity up to a pre-defined threshold. cons: doesnt guarantee the optimal pairwise alignments of the query and database sequences, you can only find a match when the gene/sequence is available in the database.
Why do we need MSA?
to reconstruct the evolutionary history of individuals in a phylogeny, to assess the sequence conservation of proteins
What do the columns in the MSA represent?
It represents amino acids or nucleotides which have descended from the same position in the sequence as in the common ancestor
What do we want in an MSA?
A MSA which correctly represents the evolutionary history of a set of sequences
Name and explain two approaches for MSA, including their advantage/ disadvantages.
1- Ad hoc : define a reference strain for the genome and pairwise align all sequences with the reference strain. advantage : position numbering is the same for each sequence
disadvantage : only possible when one knows which species the sequences come from
2- extending smith waterman to more dimensions: Instead of having a 2D matrix, we will need a k -dimensional matrix , additionally we need a scoring scheme which accounts for all sequences at each alignment position.
disadvantage: is extremely slow for instance k sequences with length m require m^k steps
3- different algorithms such as Muscle, Malign which are implemented in alignment viewers such as aliview
What are the weakness and strenghts of dotmatrix method, smith waterman , needleman-wunsch and BLAST?
Dot matrix : gives visual hints for indels but doesnt give optimal alignment
smith and needle: faster and give optimal
blast: not optimal
With Which alignment methods do you get an optimal alignment?
Needleman and smith