Part 2 - Lecture 1 - Global Pairwise Sequence Alignment Flashcards
In global pairwise sequence alignment what does the sequence mean?
could be RNA or Amino acids or DNA (ACGT) - alphabet assumed
What are the four criteria to define alignment?
- one sequence is positioned above the other
- spaces may be inserted into the sequences
- spaces may not appear on top of each other
-after inserting spaces the sequences must have the same length
What does global mean in global pairwise sequence alignment?
that we are aligning the entire sequence
What does pairwise mean in global pairwise sequence alignment?
we restrict our attention to 2 sequences at a time (in other methods there can be more sequences)
Out of these four examples which are alignments and which are not?
-all are alignments except for the top one in the left hand corner cause there are spaces on top of each other
What should alignments reveal?
biological relationships
Why might we align sequences?
-Do known sequences align well with ours? - check if we discovered a new gene
-What about parts that do not align at all?
-Can gather biological and evolutionary insights from parts that align well
What is sequence similarity a strong evidence of?
similar biological function
What are some sources of biological differences?
-substitution (point mutation)
-insertion of short sequence/deletion of short sequence (indel) do not know whether something has been inserted in one or deleted in another so call it indel
What is a segmental duplication?
duplicated blocks of genomic DNA ranging in size from 1-200kb
What is an inversion?
when a section of DNA breaks off and reattaches to the chromosome in reversed order
What is a transposition?
a discrete section of DNA is moved from one location in the genome to another
What is a translocation?
On piece of chromosome breaks and attached to another chromosome
What are some sources of technical differences in alignment?
-sequencing machines make mistakes
-different technologies lead to different errors (illumina has fewer indels an more SNVs and substitutions from PCR)
(PacBIO and Nanopore have greater indels cause they are long strand sequencing)
-PCR is a major factor
How to score alignments and what doe high scores indicate?
-high scores indicated better alignments
-each score is assigned to a position separately
-identity (match) = +1
substituion or mismatch = -u
indel = -S