L5 - Bioinformatics 2 Flashcards
why do we search for similarity in sequnces
proteins with similar structure help to identify functiosn of unknown porteins
what is homology
2 sequnces sharing similarity due to as common ancestor
what are the 2 types of homology
paralogues:
homologous genes that orginated from gene duplication of an ancestral gene
= same species the gen just changed ‘gene divergence’ = due to Transposons
Orthologoues:
homologous genes us genes in different species that diverged from a single ancestral gene after speciation event
= same anestor but different species
what is preferable to study between protein and DNA sequnces
Protein
only 25% homologous needed for protein to be a good indicator of a common ancestor
70% needed for DNA
what is pairwise alignment
similarity on a quantitative scale
Goal is to find the best way to allign 2 sequences via a scoring system
+1 for similar/identical
-1 for mismatch or gap
scoring is done by statistical probabilities based on real world data on how often those amino acids are found to substitute for another one naturally
what is needleman-Wunschs global alignment
method to align 2 sequnces along the entire length
tries to find best overall match between 2 sequneces from start to end
= works best on similar length proteins
= gives a single result = top global alignment
very slow as it considers every possible sequence allignment
what is BLAST
Basic Local Alignment Search Tool
Compares 2 sequnces and finds regions that are similar or shared between them
uses short matching ‘seed’ sequnces and extends these matches and scores these regions of similarity
= resulut includes multiple hits/macthing regions ranked by similarity score
Analogy:
Think of BLAST as a tool to compare books.
Instead of matching the entire book, it identifies similar paragraphs or sentences between two books, even if the books are of different lengths.
what is BLASTp
the protein version of BLAST, can also be done with nucleotides
you can select whichg database you use and can change the sensitivity
what is the E value in BLASTp
number of hits of that quaslity that show its due to chance
= high E value = the reuslts are due to chance
“How likely is this match to occur by chance?”
what is the Bit score
statistical meaure of alignment between query sequence and database sequence
“How strong is the alignment?”
what is PSI-BLAST
Position Specific Iterative BLAST
more sensitive version than standard BLASTp
builds off the initial search from BLASTp to build a custom, position-specific scoring matrix to help find distant evolutionary relationships
= considers probability of finding specific amino acids at each position in the alignment
matrix then used to search database AGAIN
= thsi proscess continues until there are no new hits found above a threshold
detects subtle similarities that might be missed
what does CHIP-seq do
looks at ow protein interaxts with DNA
= how they bind to specific regions
what does GO(Gene Ontology) do
vocab to describe gene products in terms of function,location in cell and involvment in proscess
= used to analyse large datasets and finds patterns
what does NOW do
accuraelty predict 3D structre of proteins from AA sequnce