Dot Plot Flashcards

1
Q

Dot plot

A

Used to determine the similarity and variability between sequences

Compares two sequences (pair wise sequence alignments) or more sequences (multiple sequence alignment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Similarity of the two sequences depends on

A
  1. The number and length of matching segments in the matrix
  2. The longer the diagonal line the higher the similarity in the sequence
    (Insertions and deletions give rise to disruption)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Diagonal lines

A

Principal diagonal
Sub diagonal
Forward subdiagonal
Backward subdiagonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The direction of the sequences on the axes will determine

A

The direction of the line on the dot plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What causes multiple lines to be plotted

A

Frameshifts
Inverted repeat sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Softwares used to create dot plot

A

Anacon
D-Genies
Dotlet
Dotmatcher
Dot plot archived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Frameshift mutation

A

Framing error or reading framed shift

Caused by insertions/ deletions of a number of nucleotide in a DNA sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inverted Repeat Sequences

A

Copies of nucleic acid sequence arranged in opposing orientation

Lie tandem
Separated by some sequence that is not part of the repeat (hyphenated)

Palindromic repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Advantages of inverted repeat sequences

A
  1. Reveals the presence of insertions/deletions
  2. Reveals direct & inverted repeats that are difficult to find
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Disadvantages of inverted repeat sequences

A

Computational programs don’t show an actual alignment
Doesn’t return a score to show how optimal a given alignment is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Applications of dot plot

A
  1. Sequence alignment: identify regions of similarity and dissimilarity between 2 sequences
  2. Genome assembly: used to compare two genomes or different regions of the same genome to identify structural variations
  3. Repeat analysis: used to identify repetitive elements within a sequence which is useful for genome annotation and analysis
  4. Identification of conserved domains: used to identify conserved domains within a protein sequence providing insights into the function of the protein.
  5. Phylogenetic analysis: used to compare the similarity between sequences from different organisms, which can be useful for constructing phylogenetic trees and inferring evolutionary relationships.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Limitations of dot plot

A
  1. Sensitivity to sequence length and complexity: difficult to interpret for sequences that are highly repetitive or have complex structure, as the resulting plot may be difficult to interpret due to the high number of dots and lack of clear patterns.
  2. Sensitivity to sequence alignment: The interpretation of a dot plot can be highly dependent on the alignment of the sequences being compared. If the alignment is poor or incorrect, the resulting dot plot may be difficult to interpret or misleading.
  3. Limited scalability: become unwieldy for large sequences or datasets, as the plot size increases with the square of the sequence length. This can make it difficult to visualize and analyze large datasets using dot plots.
  4. Limited ability to identify subtle similarities: While dot plots can be useful for identifying regions of high similarity between sequences, they may not be sensitive enough to identify more subtle similarities or differences between sequences.
  5. Dependence on chosen window size and threshold: The interpretation of a dot plot can be highly influenced by the choice of window size and threshold used to generate the plot. Different window sizes and thresholds may result in different patterns in the dot plot, leading to different interpretations of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sequence alignment

A

Sequence alignment is a fundamental technique in bioinformatics that is used to compare two or more sequences of DNA, RNA, or protein.

The goal of sequence alignment is to identify regions of similarity and difference between the sequences, which can provide insights into their evolutionary relationships, functional similarities, and structural features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pairwise alignment

A

Pairwise alignment is the comparison of two sequences to identify regions of similarity and difference.
Eg: Needleman-Wunsch algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multiple alignment sequence

A

the comparison of three or more sequences to identify regions of similarity and difference. Eg: progressive alignment algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Principle of pairwise alignment

A

Reveals homólogy between sequences

17
Q

Global alignment

A

Tries to align entire two related sequence
Aligns all letters from query and target
Suitable for closely related sequences

Eg: Needleman Wunsch algorithm
EMBOSS Needle Nucleotide Sequences

18
Q

The higher number of gaps in an algorithm the ….

A

Lower the similarity

19
Q

Available web servers for pair wise global alignment

A

EMBL-EML EMBOSS
NCBI

20
Q

Local alignment

A

Aligns regions with the highest similarity
Suitable for more divergent sequences

Eg: Smith Waterman algorithm

For determining conserved patterns in DNA or RNA sequences

21
Q

Identical sequences

A

Homologous
Similar conserved motifs
Conserved residues conserved secondary protein structures

22
Q

The function of most proteins is assigned based on

A

Homology to other known proteins rather than on the basis of result from biochemical or functional assays

23
Q

Advantages of multiple sequences over pairwise

A

They have regions that show consistent patterns of insertions and deletions

They are very powerful since they align 3 sequences when 2 sequences can’t be aligned together

24
Q

Motif

A

Nucleic acid sequence
Has some biological significance such as being DNA binding sites for regulatory proteins

25
Q

Approaches in doing multiple sequence alignment

A
  1. Exact approach
    Employs dynamic programming although the matrix is multidimensional
    Aimed at maximizing the summer alignment score of each pair of sequences
  2. Progressive sequence alignment
    Entails calculation of pairwise sequence alignment scores between all the proteins and progressively adds more sequences to the alignment
  3. Consistently based method :
    a method is consistent if it produces the same alignment when applied to multiple sequences that are known to be homologous.
  4. Iterative approach:
    involves using an initial alignment to guide the search for a better alignment.Example is the iterative refinement algorithm, also known as the Smith-Waterman algorithm.
  5. Structure based method:
    involve using the three-dimensional structure of proteins or nucleic acids to guide the alignment of their sequences.
    The structural alignment can be performed using various algorithms, such as DALI, CE, and TM-align, which compare the shapes and properties of the molecules to identify similar regions.
26
Q

Benefit and limitation of progressive sequence alignment

A

B: Permits the rapid alignment of 100s of sequences

L: Final alignment depends on the order in which the sequence has been joined