Part 2 - Lecture 1 - Global Pairwise Sequence Alignment Flashcards

1
Q

In global pairwise sequence alignment what does the sequence mean?

A

could be RNA or Amino acids or DNA (ACGT) - alphabet assumed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four criteria to define alignment?

A
  1. one sequence is positioned above the other
  2. spaces may be inserted into the sequences
  3. spaces may not appear on top of each other
    -after inserting spaces the sequences must have the same length
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does global mean in global pairwise sequence alignment?

A

that we are aligning the entire sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does pairwise mean in global pairwise sequence alignment?

A

we restrict our attention to 2 sequences at a time (in other methods there can be more sequences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Out of these four examples which are alignments and which are not?

A

-all are alignments except for the top one in the left hand corner cause there are spaces on top of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should alignments reveal?

A

biological relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why might we align sequences?

A

-Do known sequences align well with ours? - check if we discovered a new gene
-What about parts that do not align at all?
-Can gather biological and evolutionary insights from parts that align well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is sequence similarity a strong evidence of?

A

similar biological function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some sources of biological differences?

A

-substitution (point mutation)
-insertion of short sequence/deletion of short sequence (indel) do not know whether something has been inserted in one or deleted in another so call it indel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a segmental duplication?

A

duplicated blocks of genomic DNA ranging in size from 1-200kb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an inversion?

A

when a section of DNA breaks off and reattaches to the chromosome in reversed order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a transposition?

A

a discrete section of DNA is moved from one location in the genome to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a translocation?

A

On piece of chromosome breaks and attached to another chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some sources of technical differences in alignment?

A

-sequencing machines make mistakes
-different technologies lead to different errors (illumina has fewer indels an more SNVs and substitutions from PCR)
(PacBIO and Nanopore have greater indels cause they are long strand sequencing)
-PCR is a major factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to score alignments and what doe high scores indicate?

A

-high scores indicated better alignments
-each score is assigned to a position separately

-identity (match) = +1
substituion or mismatch = -u
indel = -S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What defines the best when we are trying to get the best alignmnt?

A

modeling and probability and statistics define best

17
Q

What finds the best alignment?

A

algorithms

18
Q

What is an alignment matrix?

A

if two sequences have length n and m then we have nm rows and cols with one more row and one more col for an added space at the beginning of each sequence

19
Q

What do alignments correspond to?

A

paths

20
Q

What does an alignment path look like with scoring?

A

match is +1
indel is -delta or -1
substitution is -u or -.15

21
Q

How do you calculate the best score for alignment?

A
  1. calculates the best score for prefixes of two sequences
  2. update incrementally from there
22
Q

If T is a function that gives the score of the best alignment sequence what is the first step to formalization?

A
23
Q

What is some formal notation for general recurrence relation?

A
24
Q
A
25
Q

What is the average size of the human genome?

A

3 billion bases

26
Q
A
27
Q

When we calculate the score in each cell of the matrix how do we keep a record of which neighboring cell we used?

A

we can represent this as an arrow pointing back, up, or diagonally back and up

28
Q

What does following the arrows allow us to do?

A

write out the alignment

29
Q

What does moving upward mean?

A

inserting a space in the sequence written in the top

30
Q

What does moving backward mean?

A

inserting a space of the sequence on the left

31
Q

What does moving diagonally mean?

A

no space one letter is on top of the other

32
Q

How do you perform a traceback?

A
  1. start in the bottom right
  2. follow the arrows to the top left
  3. each arrow adds a position to the alignment
  4. moving past a row or column consumes that row or column
33
Q

What is time complexity?

A

-a function of sequence length - how does the amount of work scale for each individual cell

34
Q

What is the time complexity for global pairwise alignment?

A

time complexity is O(n^2) if you compute n^2 entries in the matrix for length n sequences
-the amount of work for each individual cell is constant we need to look at all three instance and decide score based on that for each cell

35
Q

What is space complexity?

A

the space we need is proportional to the size of the alignment matrix all values must be stored

36
Q

What is the space complexity for global pairwise alignment?

A

the required space is quadratic because it is a function that scales like the square of the length of the sequences O(n^2)

37
Q

What is never less than the space complexity?

A

time complexity - since every time we do work we store it and take up space

38
Q
A