BLAST searching Flashcards
3 requirements for biological sequence database searching
Sensitivity (How correct it is)
Selectivity (Exclusion of incorrect hits)
Speed (How long it takes)
Heuristics
Taking shortcuts for solving problems in a quick way that delivers a result that is sufficient enough to be useful given time constraints.
Local vs global alignment
Local alignment identifies short regions of similarity detecting conserved domains within proteins. Global alignment compares sequences over their entire length, suitable for homologous genes with full-length conservation.
Steps in BLAST algorithm
Read input sentence, generate word seeds, search database for matches, score matches based on substitution matrix, extend matches bidirectionally until score reaches lowest point.
Scoring matrices in BLAST
BLOSUM-Blocks Substitution Matrix - based on conserved protein blocks
PAM-Point Accepted Mutation - based on evolutionary mutations over time.
Blast programmes
BLASTN - nucleotide query and answer
BLASTP - protein query and answer
BLASTX - nucleotide query, protein answer
TBLASTN - protein query, nucleotide answer
TBLASTX - nucleotide (translated) query and answer.
What is E value in BLAST results?
Score for how similar the answer is to the sequence of the query, the closer to 0 the better.
Why are protein sequences preferred in BLAST?
Proteins are conserved, less mutations (bacteria).
High scoring segment pairs (HSPs)
Ungapped local alignments between a query and a database sequence with scores above a predefined threshold.