Sequencing Databases Flashcards
Define homology.
Similarity believed to have arisen as a result of descent from a common ancestral sequence
What is % identity?
The % that are the same in the aligned residues of both sequences
What is a limitation of relying on % identity alone?
It does not tell apart similar compared to very different amino acids
What do similar sequences score in a comparison matrix?
Score more positively
What is the E-value?
Expected number (by pure chance) of sequences in the database with a score greater than the one observed
smaller = more significant
What does the E-value allow for?
Length of query sequence & size of database
Why are statistical models not perfect?
Biased AA compositions/repetitive sequences eg. transmembrane helices
How are repetitive sequences corrected?
‘masking out’ repetitive sequences (converting them to Xs) & choosing strict E-value cut off for significance
What does the search Blastp do?
Compares an AA query sequence against a protein sequence database
What does the search Blastn do?
Compares a nucleotide query sequence against a nucleotide sequence database
What does the search Blastx do?
Compares a nucleotide query sequence translated in all reading frames against a protein sequence database
What does the search TBlastn do?
Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading forms
What does the search TBlastx do?
compares the 6-frame translations of a nucleotide query sequence against the 6-frame translations of a nucleotide sequence database