Quiz 1 - BINF Flashcards
What is the World wide protein data bank
has 3d structures of proteins, nucleic acids, ligand interactions, mutations links to other protein databases
UniprotKB was
made by humans and has gene specific info and is validated but has no nucleotide sequences so protein focused
THE NCBI has
a large but redundant amount, has genes and genomes of any organisms, mrna, jmicrorna, and anything that’s ever been sequenced do both proteins and nucleotides
RefSeq is
large but not redundant, has genes and genomes, mrna microrna and is good for BLAST search, nucleotide focused
FAST-A format is
a simple sequence that has meta-data
BLAST stands for
basic local alignment search tool
What is sequence alignment used for
to find sequence similarity, find common motifs, point mutations and insertions and deletions
the three types of sequence alignments
Global, local, multiple
What does global sequence alignment do?
Determines the best alignment over the entire length of two sequences
Best when sequences are similar
What does local sequence alignment do?
looks at sequence stretches that are shorter than the entire thing
good for comparing really diff sequences with regions of similarity
What does multiple sequence alignment do?
Aligns more than 2 sequences
good for when looking for conserved sequences of patterns in a protein family
Math framework sequence alignment is good for
aiming to estblish residue-to-residue correspondances between sequences while preserving the order of other residues
Math allows for
the into of Gaos so residue-to-nothing in a sequence
Alignment scores, explain
Hoe to determine best sequence when aligning. matched are +1, mismatches and gaps -1
So blast uses a match word to start alignment and
high scoring words are extended in either direction until alignment score drops
s=
w=
p value =
e=
s= alignment score
w= word length
p value = probability that an alignment with a score greater or = to s occurred by chance
expectation cut-off (very small e means highly significant match)
e must be at most
e e must be at most 1e-3 be at most 1e-3
What makes Blast so quick?
`It doesnt try and extend or link discontinuos segments
doesn’t generate a global alignment
requires words to search
advanced programming
The dayhoff (PAM) scoring matrix uses mutation. data matrix so
its dervived using PAM point accepted mutation
What are gap penalties
making a gap is worst than extending one so making a gap is -5 while extending 1 is -1
Howd they do that?
looked at proteins that were 85% identical and manually aligned, then calc the probability for each residue to change
`