Sequence Similarity Searching Flashcards
What is database structure determined by?
The requirements of designers/users
Complete this statement, databases can be local or…?
Remote
Complete this statement, querying can be manual or…?
Automated
What must providers such as NCBI/EBI balance across users?
Demand on computation resources
What does sequence similarity in DNA/proteins suggest?
Common ancestry
What might common ancestry imply?
Common function
What is the name given to homologs separated by a speciation event?
Orthologs
What is the name given to homologs separated by a duplication event?
Paralogs
Paralogs and orthologs are two types of homologous sequence, true or false?
True
What does the alignment or equivalencing of bases enable?
Maximisation of similarity
What could a database query look like?
Could simply be a sequence (DNA/protein)
Could be a logical structure, e.g. human + mitochondrial + HVS2
Why do sequence databases require specialised search tools?
Due to size and similarity
Is quantification of biological similarity easy or difficult?
Can be difficult
What can searching sequence databases for similar sequences predict about novel sequences?
Possible functions
What can alignments of sequences contain?
Mismatches and gaps
How are mismatches and gaps interpreted in sequence alignments?
As substitutions and indels respectively
What do alignment algorithms ideally try to identify about sequences?
The most likely evolutionary ‘path’ between sequences
What are databases?
Searchable collections of information
What does how we search databases depend on?
Database access, design and location
What does the quantification of sequence similarity require?
Alignment
What is the constant gap penalty?
Opening a gap of any size attracts a constant (a) negative score
= -a
What is the proportional gap penalty?
Opening a gap attracts a penalty proportional to its length (L)
= -(aL)
What is the affine gap penalty?
Opening a gap attracts a constant (a), extending it attracts a penalty (b) proportional to the gap’s length (L)
= -(a+bL) where a»b
What type of gap penalty is generally the most relevant biologically?
Affine
What does the choice of gap penalty depend on?
Software
What do amino acid side chains share?
Chemical properties (acidic/basic etc.)
What is the accepted theory about amino acid substitutions?
Chemically similar amino acids substitute more readily than chemically dissimilar amino acids
What is ‘built into’ amino acid substitution matrices?
Physico-chemical classification of amino acids
In the PAM250 (accepted point mutation) substitution matrix, what do similar amino acids score?
+ve score
In the PAM250 (accepted point mutation) substitution matrix, what do dissimilar amino acids score?
-ve score
What is the PAM250 (accepted point mutation) substitution matrix based on alignments of?
Closely related proteins
What are accepted point mutation substitution matrices extrapolated to?
Large (PAM120) and very large (PAM250) evolutionary distances
What is BLOSUM62 (blocks substitution matrix) based on alignments of?
Gap free alignments of short protein motifs (blocks)
What do the numbers represent in BLOSUM62 (blocks substitution matrix)?
Level of identity in alignments (BLOSUM62 = 62%)
BLOSUM is continually updated, true or false?
False, it is no longer updated but is still widely used
BLOSUM62 (blocks substitution matrix) has no extrapolation. What does this mean for distant relationships?
More reliable
What is the default amino acid substitution matrix of BLAST?
BLOSUM62 (blocks substitution matrix)
What are the types of BLAST searches and their uses?
Nucleotide query versus nucleotide database, i.e. what gene is this?
Protein query versus protein database, i.e. what protein is this?
Translated nucleotide query versus protein database, i.e. does this DNA sequence code for a known protein?
Protein query versus translated nucleotide database, i.e. can we identify a DNA sequence that might encode this protein?
What are the 4 sections in the results page of a BLAST search?
Search information (including RID) Graphical summary (conserved domain search) Results table (hyperlinked to alignments) Alignments (download links)
Databases contain more information than can be searched practically by observation, true or false?
True
Most databases are relational. What does this mean?
The data are organised into table with defined inter-relationships
Does the manual querying of remote databases require specialist knowledge?
Little
What might automated querying of local databases enable?
Greater throughput and flexibility
Cheaper hardware for databases can increase locally available resources but what does it also make quite costly?
Administration