Shane - Lecture 1 Flashcards
What can you infer if two sequences are similar?
They probably have the same ancestor, share the same structure and have a similar biological function
What qualifies as a homologue?
(2)
An amino acid sequence that is more than 100 amino acids long/nucleotides long
Where there is 25% identical amino acids or 70% identical nucleotides
What is the twilight zone?
(2)
Protein sequences with between 0 and 20% identical amino acids
It is not significant -> this could have arisen by chance
What is E-value?
Expectation value
What does BLAST do?
Searches a database of your choice for sequences that have homology or a shared ancestry with the sequence you have entered
What does an E-value quantify?
The chance of the match happening by chance
What does an E-value close to zero mean?
It is very unlikely that the similarity arose due to chance
What does an E-value close to 1 mean?
It is very likely that the similarity arose due to chance
Write a note on BLAST
(4)
Quick heuristic alignment algorithm
Found on the national centre for biotechnology information (NCBI)
Matches DNA sequences to other DNA sequences
Uses either a gene sequence or a protein sequence
How does BLAST work?
(4)
It tries to find a small match first then expands on this match
It divides the sequence up into shorter parts e.g. 11 nucleotides and tries to match them (heuristic)
There might be some mismatches when the match extends
BLAST will keep extending the match until mismatches become too significant
What does BLAST stand for?
Basic
Local
Alignment
Search
Tool
List some uses of BLAST
To identify an unknown sequence by trying to match it to something known
Get clues about the function/structure of a protein by finding similar proteins
Map a sequence in a genome
What are the two types of blast searches?
Nucleotide BLAST
Protein BLAST
Which type of BLAST is better to use?
Protein BLASTS are more sensitive and biologically significant
How do you decide what type of BLAST to use?
(2)
Do you have a nucleotide sequence or a peptide sequence
Do you want a close match or something identical
What tool can you use to translate a nucleotide sequence into a peptide sequence?
ExPASy Translate tool
Why is it recommended to use a protein search instead of a gene search?
Gene sequences might be identical but have different functions
Give examples of specialised BLAST searches
BlastX
tblastn
Write a note on BlastX
(4)
You have a gene sequence but want a protein sequence
Blast X translates your nucleotide sequence into proteins and finds a match for it
BlastX translates the sequences in all six reading frames
BlastX is often the first analysis performed with newly determines nucleotide sequences
How many reading frames does any piece of DNA have?
Six reading frames
Explain what is meant by DNA having six reading frames
(3)
In one direction a DNA sequence can start from three different points = 3 reading frames
In the other direction a DNA sequence can start from three different point = 3 reading frames
= 6 reading frames
What indicates a stop codon in a BLASTX search?
They will be highlighted in red
How can you tell when BLASTX has used the wrong reading frame?
The stop codon will appear too early
Write a note on tblastn
(5)
Searches a translated nucleotide database using a protein query
Rarely used
The reverse of BlastX
Reads 6-frame translations
Finds homologous protein coding regions in unannotated nucleotide sequences
What does a sequence with refseq_protein in front of it indicate?
It indicates the protein has been sequenced really well and is really reliable
What three things should you look out for in front of a sequence that proves it’s of good quality?
refeq_protein
swissprot
pdb
Who allocates swissprot headers?
Swissprot protein sequences
A European protein database
Who allocates ‘pdb’?
Protein Data Bank
Sequences from RCSB protein data bank with experimentally determined structures
What should you consider when choosing an algorithm?
How close of a match are you looking for - a close match or a perfect match
List the three protein-protein BLAST algorithms
Blastp
PSI-BLAST
PHI-BLAST
What is Blastp used for?
For distantly related proteins
What is PSI-Blast used for?
Distantly related proteins
There are some allowed mutations
What is PHI-Blast used for
(2)
Used when you know protein family has a signature pattern, active site, structural domain etc
Looking for another protein with a specific pattern
What are the three nucleotide BLAST algorithms?
blastn
Megablast
Discontiguous megablast
What is blast n used for?
Used to find similar nucleotide sequences
What is megablast used for?
(3)
Used to find highly similar nucleotide sequences
Very fast
Used to identify nucleotide sequences
What is a discontiguous megablast used for?
Used to find possible homologies
Finds dissimilar sequences
What is the BLAST MAx socre?
The score of single best aligned sequence
What is the BLAST total score?
The sum of scores of all aligned sequences
What does it mean if total score and max score are the same?
Only a single alignment is present
What is query coverage?
What percentage of the query sequence is aligned
What is E value?
Number of matches with the same score expected by chance
List the three protein databases
Reference proteins (refseq_protein)
Swissprot protein sequences (swissprot)
Protein Data Bank proteins (pdb)