Alignment Flashcards
What is sequence alignment?
a way of arranging sequences of DNA, RNA or protein to indentify regions of similarity that may be a consequence of functional, structural or evolutionary relationships between the sequences.
What is global alignment?
find the best parts in the whole sequence
What is local alignment?
Aligning regions within the sequences
Pairwise Sequence Alignment
Used to decide if two proteins or genes are related structurally or functionally
Multiple Sequence Alignment
Uses 3 or more sequences, can either be global or local.
T/F when carrying out alignment protein sequences are more informative than DNA
True. Protein sequences are more conserved than DNA sequences
When is DNA alignment appropriate?
To confirm the identity of a DNA sequence
To study non coding region
To study DNA polymorphisms (changes in DNA sequences)
What are homologs?
Homologs are genes or proteins that have a common evolutionary origin, implying that they share a similar ancestry.
What are orthologs?
Orthologs are a specific type of homologous genes found in different species that evolved from a common ancestral gene via speciation events that have the same function.
What are paralogs?
Paralogs are homologous genes within the same species that arose from a gene duplication event. They have similar sequences but may have diverged in function, potentially taking on new roles or functions within the same organism.
What is percentage identity and How do you calculate it?
the percentage of positions in an alignment that have identical amino acids between two protein sequences.
What is percentage similarity and How do you calculate it?
It represents the percentage of aligned residues that can be more readily substituted for each other.
For example, amino acids with similar properties (like charge or size) are considered positive matches.
Explain the need for scoring in alignment programs
Scoring in alignment programs helps evaluate the quality of sequence alignments, aiding in the identification of evolutionarily related sequences.
What are the differences in scoring DNA and protein sequences?
DNA and protein sequences differ in scoring due to distinct alphabets: DNA (A, C, G, T) and proteins (20 amino acids).
Protein scoring matrices such as BLOSUM account for amino acid properties, while DNA scoring considers nucleotide substitution frequencies, like purine-pyrimidine transitions and transversions.
Explain the possible need for assigning different values to amino acid substitutions and the scoring matrices used.
Different values for amino acid substitutions reflect the degree of biochemical similarity or dissimilarity, allowing the consideration of conservative and non-conservative changes in protein sequences.
For instance, conservative substitutions are scored lower because they are more likely to be tolerated in protein structures without affecting the function.