L10&11 - Bioinformatics Flashcards
What is bioinformatics?
“Science of storing, retrieving and analysing large amounts of biological data”
Combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data
What are protein domains?
Domains are distinct functional or structural sites in a protein sequence and may contain one or more motifs (short recurring patterns in a protein)
What are protein families?
A group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure
What is a database?
A structured set of data held in a computer, especially one that is accessible in many ways
Scoring when comparing sequences
Scores:
2 for a match
0 for a mis-match
-1 for an insert (gap)
Difference between local and global alignments
Global = similarity across the full length of the sequence (one unit)
Local = considers regions of similarity in parts of the sequence
What is BLAST?
Basic Local Alignment Search Tool
Theory: uses a segment pair, searches fixed length segments, these hits are then extended until they score above pre-set threshold
Lots of different programs available
Programs available in BLAST
blastp blastn blastx tblastn tblastx
blastp
An amino acid query sequence against a protein database
blastn
A nucleotide query sequence against a nucleotide sequences
blastx
A nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn
A protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx
The six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database
What is the expect (E) value in BLAST?
A parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size
Similar to a p value
The smaller the better
If we manually increase this value get more ‘hits’ but may not be better (increasing chance of alignment but may be poor)
Other parameters in BLAST
Low complexity filter - removes regions of low complexity from alignments (high frequency simple repeats) can be problematic
Lots of other settings that can be altered for specific searches- massive capacity