Dot Plot Flashcards
Dot plot
Used to determine the similarity and variability between sequences
Compares two sequences (pair wise sequence alignments) or more sequences (multiple sequence alignment)
Similarity of the two sequences depends on
- The number and length of matching segments in the matrix
- The longer the diagonal line the higher the similarity in the sequence
(Insertions and deletions give rise to disruption)
Diagonal lines
Principal diagonal
Sub diagonal
Forward subdiagonal
Backward subdiagonal
The direction of the sequences on the axes will determine
The direction of the line on the dot plot
What causes multiple lines to be plotted
Frameshifts
Inverted repeat sequences
Softwares used to create dot plot
Anacon
D-Genies
Dotlet
Dotmatcher
Dot plot archived
Frameshift mutation
Framing error or reading framed shift
Caused by insertions/ deletions of a number of nucleotide in a DNA sequence
Inverted Repeat Sequences
Copies of nucleic acid sequence arranged in opposing orientation
Lie tandem
Separated by some sequence that is not part of the repeat (hyphenated)
Palindromic repeats
Advantages of inverted repeat sequences
- Reveals the presence of insertions/deletions
- Reveals direct & inverted repeats that are difficult to find
Disadvantages of inverted repeat sequences
Computational programs don’t show an actual alignment
Doesn’t return a score to show how optimal a given alignment is
Applications of dot plot
- Sequence alignment: identify regions of similarity and dissimilarity between 2 sequences
- Genome assembly: used to compare two genomes or different regions of the same genome to identify structural variations
- Repeat analysis: used to identify repetitive elements within a sequence which is useful for genome annotation and analysis
- Identification of conserved domains: used to identify conserved domains within a protein sequence providing insights into the function of the protein.
- Phylogenetic analysis: used to compare the similarity between sequences from different organisms, which can be useful for constructing phylogenetic trees and inferring evolutionary relationships.
Limitations of dot plot
- Sensitivity to sequence length and complexity: difficult to interpret for sequences that are highly repetitive or have complex structure, as the resulting plot may be difficult to interpret due to the high number of dots and lack of clear patterns.
- Sensitivity to sequence alignment: The interpretation of a dot plot can be highly dependent on the alignment of the sequences being compared. If the alignment is poor or incorrect, the resulting dot plot may be difficult to interpret or misleading.
- Limited scalability: become unwieldy for large sequences or datasets, as the plot size increases with the square of the sequence length. This can make it difficult to visualize and analyze large datasets using dot plots.
- Limited ability to identify subtle similarities: While dot plots can be useful for identifying regions of high similarity between sequences, they may not be sensitive enough to identify more subtle similarities or differences between sequences.
- Dependence on chosen window size and threshold: The interpretation of a dot plot can be highly influenced by the choice of window size and threshold used to generate the plot. Different window sizes and thresholds may result in different patterns in the dot plot, leading to different interpretations of the data.
Sequence alignment
Sequence alignment is a fundamental technique in bioinformatics that is used to compare two or more sequences of DNA, RNA, or protein.
The goal of sequence alignment is to identify regions of similarity and difference between the sequences, which can provide insights into their evolutionary relationships, functional similarities, and structural features.
Pairwise alignment
Pairwise alignment is the comparison of two sequences to identify regions of similarity and difference.
Eg: Needleman-Wunsch algorithm
Multiple alignment sequence
the comparison of three or more sequences to identify regions of similarity and difference. Eg: progressive alignment algorithms
Principle of pairwise alignment
Reveals homólogy between sequences
Global alignment
Tries to align entire two related sequence
Aligns all letters from query and target
Suitable for closely related sequences
Eg: Needleman Wunsch algorithm
EMBOSS Needle Nucleotide Sequences
The higher number of gaps in an algorithm the ….
Lower the similarity
Available web servers for pair wise global alignment
EMBL-EML EMBOSS
NCBI
Local alignment
Aligns regions with the highest similarity
Suitable for more divergent sequences
Eg: Smith Waterman algorithm
For determining conserved patterns in DNA or RNA sequences
Identical sequences
Homologous
Similar conserved motifs
Conserved residues conserved secondary protein structures
The function of most proteins is assigned based on
Homology to other known proteins rather than on the basis of result from biochemical or functional assays
Advantages of multiple sequences over pairwise
They have regions that show consistent patterns of insertions and deletions
They are very powerful since they align 3 sequences when 2 sequences can’t be aligned together
Motif
Nucleic acid sequence
Has some biological significance such as being DNA binding sites for regulatory proteins
Approaches in doing multiple sequence alignment
- Exact approach
Employs dynamic programming although the matrix is multidimensional
Aimed at maximizing the summer alignment score of each pair of sequences - Progressive sequence alignment
Entails calculation of pairwise sequence alignment scores between all the proteins and progressively adds more sequences to the alignment - Consistently based method :
a method is consistent if it produces the same alignment when applied to multiple sequences that are known to be homologous. - Iterative approach:
involves using an initial alignment to guide the search for a better alignment.Example is the iterative refinement algorithm, also known as the Smith-Waterman algorithm. - Structure based method:
involve using the three-dimensional structure of proteins or nucleic acids to guide the alignment of their sequences.
The structural alignment can be performed using various algorithms, such as DALI, CE, and TM-align, which compare the shapes and properties of the molecules to identify similar regions.
Benefit and limitation of progressive sequence alignment
B: Permits the rapid alignment of 100s of sequences
L: Final alignment depends on the order in which the sequence has been joined