Lecture 2 Flashcards
What is protein bioinformatics
Analysis of protein sequences and structure to get insight on the properties and function of the protein
What do we use to compare sequences
Blast (from the NCBI website)
It looks for other sequences in the data base that match the one you put in
What can you put in the query of a blast
The accession number
The gi
The bare sequence
Or the FASTA formatted sequence
What is a GI number (gi)
It’s a simple series of numbers that are assigned to each sequence process by NCBI
What is fasta format
How long are the lines
Starts with > then a single line description of the sequence on top
All lines of sequence are shorter than 80 characters
No blank lines
What do B U X Z * - stand for in fasta
Aspartate/asparagine
Selenocysteine
And amino acid residue
Glutamate/glutamine
Translation stop
Gap of any length to align the sequence better
What is selenocysteine
Another AA after bacteria hijack 1 of 3 stop codons and replace them with pyrolysine or selenocysteine
What are metagenomic proteins
Extract RNA/DNA for bulk sample (like ocean water)
Takes that sequence and do blastp
When does quick blastP (accelerated protein protein blast) work best
If the target is more than 50% identical
What are the other type of blasts
Psi blast (position specific scoring matrix based on first run)
Phi blast (alignments that are limited to one that match a pattern in the query)
Delta blast (position specific scoring using results of a conserved domain database)
What is BLOSUM62
The matrix assigns a score for aligning pairs of residues
Negative charged amino acids
Aspartate glutamate
Postive charged amino acids
Lysine, histidine, arginine
Polar uncharged amino acids
Serine, threonine ,asparagine, glutamine
Amino acids with hydrophobic side chains
Leucine, valine, isoleucine, alanine, methionine, phenylalanine, tyrosine, tryptophan
Which amino acids are special cases
Cysteine, selenocysteine (U), glycine, proline (helix breaker)
Why do the unique amino acids get higher score during BLOSUM
Because since they’re so unique, they’re in the position for a reason meaning they get a higher score
When scoring, what is the affect of putting gaps in the sequence to match the amino acids
That match gets a -1 score
In scoring, cysteine with any other amino acid gets what score
A negative score
Why does the algorithm really like to align tryptophans with tryptophans? (give it a very high score)
Because it’s such an unusual amino acid
In the graphic summary of a blast p, the first line is
If one line start shorter then other lines what does this mean
The top hit
The first half of that sequence doesn’t match so it’s a gap
What is the expected or E value of a blastp
Tells the number of hits (matches) expected to be got by chance
It’s used to create a threshold of significance (like how likely is it that it got aligned by chance)
If low that means that the sequence is a signifanct match and should be
What do the positives mean in a blastp sequence alignment
If it puts + in this means it aligned a conserved substitution
Ex. F to Y, it matched these with a plus because they have the some properties but are different amino acids
If in a sequence there are AV
—
How many gaps is it
2
What is a blastp clustal alignment
Shows all the different sequence alignments of all matches
What is similarity between sequences quantified by
% identity
% similarity (similar amino acids, Leucine, isoleucine)
What is homologous in matcheing sequences
The products of 2 genes have a shared ancestry
Meaning it matches sequences that may have come from a common ancestor
In a table with amino acids and their preference to adopt a specific secondary structure, what does a value greater than one mean
Show that that amino acid has a tendency to adopt that secondary structure
What are the helix breakers
Glycine and proline
What are IDR’s
Intrinsically disordered regions
Why would something want intrinsically disordered regions
Exposes short linear motifs that mediated protien protein interactions
Allows for regulation of the protein funtion due to PTM at this IDR
Regulates the proteins half life by engaging proteins that have been targeted for degredation by the proteosome (so adds ubiquitin to the IDR)
Adopts different confirmations when binding to different interaction partner
What are traits of intrinsically disordered protiens (IDP)
They are fully disordered
Can be boiled and stay soluble (instead of precipitating)
IDR are ____ than loops and turns
Longer
Example of a protein with IDR
PP2B/calcineurin
What are sequence signatures
A sequence that has certain key amino acids in specific positions that only are there to do a specific role (like fold specifically or a have specific property)
[LMFY]
{EF}
x
In a sequence signature means what
Any amino acid in the brackets
Any amino acid except the ones in the brackets
Any amino acid
What are motifs
How long are they
Short sequence pattern that has a specific function
Usually 3-8 aA , max is 20aa
Give example of motifs
Transit peptides (n term sequence that takes the protein to a specific area in The cell)
Binding sequence (the sequence makes the protien complex with another protien, specific)
Motif is recognized for covalent modification
What are domains
A region of the protiens polypeptide chain that folds independently and has a specific function
Like a parts list for proteins
Ex. SH2/SH3 domains
What does the website PROSITE tell us
About the proteins signatures, domains, and motifs
What does < and > mean is prosite
Amino terminal element
Carboxy terminal element
What does x(2,4) mean in prosite
x-x
Or x-x-x
Or x-x-x-x
So any number from 2 to 4 of any amino acid
What is the rule for x(2,4)
Only for x and not allowed at the amino of carboxy terminus unless anchored to the terminus
What website lets us see transmembrane regions/prediction of a protein
DeepTMHMM
What are SLiMs
Short linear interaction motifs
They drive specific protein protein interactions
Give 2 examples of what a SLiM does
The motif RVxF on one protein docks PP1 (protein phosphotase 1) on to that protein
It’s a 5 residue motif
Peroxisime targeting: signals are located at the c termini of the protein (ex. SKL coo-)
This makes it go to the peroxisome
What is pY
Phosphotyrosine
Once a transit peptide takes its protein to a certain area in the cell what happens
A protease cleaves the transit peptide
What are transit peptides used for
To go to chloroplast, mitochondria, secretion From cell