Bioinformatics 1 Flashcards
What are two types of pairwise alignments?
Global alignment and local alignment
What is a global alignment?
Finds the simplest alignment across the entire two sequences - key for building a MSA
What are global alignments used for?
Similar proteins - same protein from different species.
What is a local alignment?
Tries to identify any common domains/regions between sequences and aligns them - these will be surrounded by unaligned regions.
What happens if you use a local alignment on very similar proteins?
It will affectively just do a global alignment.
What are local alignments used for?
Protein sequences with common domains and aligning cDNA to the genomic sequence.
Why do you need to use a local alignment for aligning cDNA to the genome and not a global alignment?
A global alignment will try to force an alignment because the exons match - but would be wrong.
A local alignment would give several alignments for each exon match in the genome - this identifies their location in the genome.
What database does blastn use?
nucleotide
What database does blastp use?
amino acid
What is the query type and database for tblastn?
query = amino acid database = translated nucleotide
What is the query type and database for blastx?
query = translated nucleotide database = amino acid
What is the query type and database for tblastx?
query = translated nucleotide database = translated nucleotdie
Which one of all the BLAST programs does not search in a protein database?
blastn
Why can a non exact match be as informative as a perfect match in an alignment?
Because amino acids are mutated during evolution - there is a higher probability the mutated amino acid being maintained if the chemical/physical properties of original amino acid are being maintained in the mutation. This means that mutations between sequences can be scored.
How do you build a substitution matrix?
By choosing a group of similar proteins and scoring based on the observed frequency of the amino acids within the protein.
What was the first substitution matrix and when was it developed?
PAM - point accepted mutation developed in the 1970s.
Note = by Margaret Dayhoff
How was PAM developed?
Did a global alignment on closely related proteins and looked at the sequence differences in the proteins - used this to derive a scoring matrix for how often at position did one amino acid mutate to another.
What type of proteins does PAM only work with?
Only works with closely related proteins.
What is the difference between PAM30 and PAM70?
The higher number in the naming scheme denotes lower sequence similarity and larger evolutionary distance.
What does BLOSUM stand for?
Block substitution matrix
What kind of proteins does BLOSUM work with?
Evolutionary divergent proteins.
Explain the basis of BLOSUM.
Does MSA of evolutionary proteins - looks at conserved regions of the proteins. If they are conserved this normally means they are functionally important and more pressure on amino acids that mutate to maintain similar properties.
How does a BLOSUM62 matrix work?
calculates the likelihood of the two amino acids that have aligned mutating to one another. This attributes a score to every alignment - so can see which alignment is the best.
What does a positive BLOSUM62 score mean?
Conservative substitution - likely to happen
e.g. Lys/Arg = +2
What does a negative BLOSUM62 score mean?
Unlikely to happen.
e.g. Gly/Leu = -4
What is the log odds?
The logarithm of the odds of success.
What is the equation for log odds?
log(observed frequency AA paring/expected random frequency of pairing)
What log base does BLAST used?
base 2
If the frequency of Met = 1/100 and leucine = 1/10, what is the log odds if the observed frequency is 1/500?
random frequency = 1/100*1/10 = 1/1000
(1/500)/(1/1000) = 2
log2(2) = 1
How are log odds simplified?
They are converted to integers/
How is precision of the score maintained when converting the score to an integer?
It is multipliedby a scaling factor
What are scores that have been scaled and converted to integers called?
Raw scores
What scaling factor is used for BLOSUM matrices?
2
How this normalised score calculated from the raw score and why is it?
Use the matrix specific constant lambda.
Done because raw scores can be misleading as the scaling factors are arbitrary.
What is lambda?
Is approximately the inverse of the original scaling factor - however value may be different due to integer rounding errors.
What does BLAST also introduce to achieve the best possible alignment?
Gaps
What is the penalty for a opening a gap and why is it so high?
Penalty = -11
From an evolutionary perspective insertions and deletions are costly so a high penalty.
What is the penalty for extending a gap?
-1
What is the penalty for opening a gap and extending a gap in nucleotide alignments?
Opening = -5 Extending = -2
Define BLAST score.
The score is calculated by incrementing for matches/similarities and decrementing for mismatches gaps.
Define BLAST identity.
The number of residues that are identical in the alignment.
Define BLAST gaps.
The number of gaps in the alignment.
Define BLAST e-value.
It is a measure of how reliable an alignment is/how likely it is to be correct.
What does the E-value take into account?
Size of the query sequence, length of the alignment, overall score and the size of the queried database.