Lecture 2 Flashcards

1
Q

When comparing two nucleotide sequences what do we need to keep in mind?

A

We have to keep in mind that they are result of mutation during replication (genotypic level) and selection (phenotypic level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Could you give and example of how evolution occurs in genotypic and phenotypic level ?

A

At sequence level is GAC changes to AAC, this muting results in antibody not binding anymore and there fore it minds to HIV instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many nucleotides are there for RNA and DNA?

A

DNA: TCAG, RNA: UCAG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How many amino acids are there?

A

20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a codon?

A

3 nucelotides which encode for one amino acid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To change the phenotype, at least how many nucleotides should be changed?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What could have to DNA/RNA when they are copied?

A

mutation, insertion, deletion, repeat, inversions, inverted repeated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

To compare two sequences, what do we need to know about them initially?

A

which positions in the sequences correspond to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A correct alignment represents — events such as — and — ( — and —)

A

actual, substitutions, indels, insertion and deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sequences with shared ancestry are referred to as what?

A

homologous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

to be able to align sequences we base the idea of what?

A

That there is a common ancestor which genes evolved from. The ancestor has a certain nucleotide or aminoaide at a certain position which could have changed during the evolutionary history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Could we be sure of the alignment?

A

no but we choose the model with the highest probability or score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are types of alignments?

A

pairwise : protein-protein, DNA/DNA, RNA/RNA, DNA or RNA with a protein : shifts within a codon
multiple sequence alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are two strategies for alignments? explain each

A

Global: aligns one sequence to there other from start to end, local: finds the longest subsequences with highest similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are strategies for finding a pairwise sequence alignment?

e.g different methods

A

qualitative method : dot-matrix method
exact method via dynamic programming: needle-man for global and smith for local,
heuristic and fast methods: word methods like BLAST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

align CTG and CTAAG, CTAAGAAG and CTAAG, ATC and CTAAG using the dot matrix method and say what each pattern shows

17
Q

What are the pros and cons of dot matrix?

A

pro: visually easy to identify sequence features such as indels, repeated, inversions and inverted repeats, cons: time consuming and due to being qualitative doesnt give optimal alignment

18
Q

In quantitative methods how do we know to accept for instance mutation or gap?

A

By assigning costs to different actions in the alignment process

19
Q

What are 3 possibilities of characters being compared at one position?

A

match, mismatch, gap

20
Q

What are the total number of alignments between a sequence of length m and sequence of length n<m? explain how you got to this

21
Q

What is dynamic programming?

A

its breaking down a more complicated problem to simpler sub problems and solve them in a recursive manner, and finding the optimal solution to the sub problems.

22
Q

Explain how the smith waterman algorithm works? how many steps does it include?

A

if we have two sequences A and B, we calculate the optimal alignment to one point only once and we built am trip for the two sequences with seaA row and serb column. the field (I,j) corresponds to the score of the optimal alignment with the nucleotides ai and bj as the end of the alignment. Finally we find the best way through the complete matrix
it requires m*n steps

23
Q

What are the pros and cons of the smith waterman algorithm?

A

its fast in comparison of brute force , it finds the optimal or one of multiple alignments with the same highest score, its only for local alignments, only pair wise alignment possible, still too slow for scanning against big libraries

24
Q

align AATC and AGAC according to needle man wunsh

25
Q

How does BLAST work?

A

it aligns the query sequence against all known sequences from genback which is a collection of DNA sequences and it reports the best alignment

26
Q

Explain in detail how the BLAST algorithm works.

A
  1. split the query into sequences of length k
  2. search k-letter words in the database sequences, similar is ok but is scored. 3. only keep the sequences above the set threshold 3. expand the k-letter words to the right and left. 4. expand the k-letter words to the right and left 5. stop if the score drops below a certain threshold 6. keep only the pairwise alignments which are above the threshold 7. report the database sequences (doesnt have to be exact sequences could be similar)
27
Q

What are the pros and cons of blast?

A

up to 50-100 times faster than direct alignment, allows searches for exact matches but also similarity up to a pre-defined threshold. cons: doesnt guarantee the optimal pairwise alignments of the query and database sequences, you can only find a match when the gene/sequence is available in the database.

28
Q

Why do we need MSA?

A

to reconstruct the evolutionary history of individuals in a phylogeny, to assess the sequence conservation of proteins

29
Q

What do the columns in the MSA represent?

A

It represents amino acids or nucleotides which have descended from the same position in the sequence as in the common ancestor

30
Q

What do we want in an MSA?

A

A MSA which correctly represents the evolutionary history of a set of sequences

31
Q

Name and explain two approaches for MSA, including their advantage/ disadvantages.

A

1- Ad hoc : define a reference strain for the genome and pairwise align all sequences with the reference strain. advantage : position numbering is the same for each sequence
disadvantage : only possible when one knows which species the sequences come from
2- extending smith waterman to more dimensions: Instead of having a 2D matrix, we will need a k -dimensional matrix , additionally we need a scoring scheme which accounts for all sequences at each alignment position.
disadvantage: is extremely slow for instance k sequences with length m require m^k steps
3- different algorithms such as Muscle, Malign which are implemented in alignment viewers such as aliview

32
Q

What are the weakness and strenghts of dotmatrix method, smith waterman , needleman-wunsch and BLAST?

A

Dot matrix : gives visual hints for indels but doesnt give optimal alignment
smith and needle: faster and give optimal
blast: not optimal

33
Q

With Which alignment methods do you get an optimal alignment?

A

Needleman and smith