Bioformatics 2 Flashcards
What is Fasta ?
Format used to introduce a protein sequence in a BLAST search
What is System Biology ?
Systems biology is a systematic approach to understanding all the genes and their expression under different conditions. You can only do this when you have all the proteins in a particular family.
Why use Blast ?
Finding Model Organisms for Study of Disease
Example Cystic Fibrosis
BLAST helps you to find homologous genes and proteins
Homologous Proteins (or genes)
Have a common ancestor (theyre related)
Have similar structures
Have similar functions
What are the criterias for considering two sequences to be homologous ?
Proteins are homologous if
Their amino acid sequences are at least 25% identical
DNA sequences are homologous if
they are at least 70% identical
Note that sequences must be over 100 a.a. (or bp) in length
What does BLAST DO ?
BLAST takes a query sequence
Compares it with millions of sequences in the Genbank databases
By constructing local alignments
Lists those that appear to be similar to the query sequence
The “hit list”
Tells you why it thinks they are homologous
BLAST makes suggestions
YOU make the conclusions
How do I input a query into BLAST?
Choose which “flavor” of BLAST to use
How do I interpret the results of a BLAST search?
BLAST creates local alignments
What is a local alignment?
BLAST looks for similarities between regions of two sequences
The BLAST Output( GRAPHIC DISPLAY)
How good is the match ? Red = excellent! Pink = pretty good Green = OK, but look at other factors Blue = bad Black = really bad!
How long are the matched segments?
Longer =Better
The hit list
BLAST lists the best matches (hits) For each hit, BLAST provides: Accession number – links to Genbank flatfile Description “G” = genome link E-value An indicator of how good a match to the query sequence Score Link to an alignment
What is an E-value?
E-value
The chance that the match could be random
The lower the E-value, the more significant the match
E = 10-4 is considered the cutoff point
E = 0 means that the two sequences are statistically identical
The Alignment
Look for: Long regions of alignment With few gaps % identity should be >25% for proteins (>70% for DNA)
Conclusion in Blast ?
Look at E-value
Look at graphic display
If necessary, look at alignment
Make your best guess!