Lecture 2 Flashcards
What is protein bioinformatics
Analysis of protein sequences and structure to get insight on the properties and function of the protein
What do we use to compare sequences
Blast (from the NCBI website)
It looks for other sequences in the data base that match the one you put in
What can you put in the query of a blast
The accession number
The gi
The bare sequence
Or the FASTA formatted sequence
What is a GI number (gi)
It’s a simple series of numbers that are assigned to each sequence process by NCBI
What is fasta format
How long are the lines
Starts with > then a single line description of the sequence on top
All lines of sequence are shorter than 80 characters
No blank lines
What do B U X Z * - stand for in fasta
Aspartate/asparagine
Selenocysteine
And amino acid residue
Glutamate/glutamine
Translation stop
Gap of any length to align the sequence better
What is selenocysteine
Another AA after bacteria hijack 1 of 3 stop codons and replace them with pyrolysine or selenocysteine
What are metagenomic proteins
Extract RNA/DNA for bulk sample (like ocean water)
Takes that sequence and do blastp
When does quick blastP (accelerated protein protein blast) work best
If the target is more than 50% identical
What are the other type of blasts
Psi blast (position specific scoring matrix based on first run)
Phi blast (alignments that are limited to one that match a pattern in the query)
Delta blast (position specific scoring using results of a conserved domain database)
What is BLOSUM62
The matrix assigns a score for aligning pairs of residues
Negative charged amino acids
Aspartate glutamate
Postive charged amino acids
Lysine, histidine, arginine
Polar uncharged amino acids
Serine, threonine ,asparagine, glutamine
Amino acids with hydrophobic side chains
Leucine, valine, isoleucine, alanine, methionine, phenylalanine, tyrosine, tryptophan
Which amino acids are special cases
Cysteine, selenocysteine (U), glycine, proline (helix breaker)
Why do the unique amino acids get higher score during BLOSUM
Because since they’re so unique, they’re in the position for a reason meaning they get a higher score
When scoring, what is the affect of putting gaps in the sequence to match the amino acids
That match gets a -1 score
In scoring, cysteine with any other amino acid gets what score
A negative score