Bioinformatics Flashcards
What is systems biology?
The study of the interactions between components of biological systems and the function and behaviour they provide.
What is the cycle of biological science discovery?
New hypothesis Experiment New data Model construction Model analysis Biological insight
What is sequence identity?
A perfectly matched sequence
What is sequence homology?
A partially matched sequence
What are gene predictions built around?
Pattern recognition
What search does DNA fingerprinting use?
Homology search using microsatellites- small arrays of tandem repeats
What is the evolutionary theory based on?
The similarities in biological sequences
What is sequence annotation?
The process of identifying similarities between different biological sequences
How does sequence homology work?
It compares an unknown sequence against a database of known sequences
Which databases contain DNA sequences?
ENA/EMBL
GenBank
DDBJ
Which databases contain protein sequences?
UniProtKB
RefSeqP
What is the algorithm used to compare an unknown sequence to known sequences?
A pairwise sequence alignment
What are the two types of alignment and what do they mean?
Global- aligns the whole sequence
Local- aligns domains and subsequences so some parts are unrelated
How are alignments produced?
A score is produced for each match or mismatch. If the score reaches a threshold it is reported.
What are expressed sequence tags (ESTs)?
cDNA produced from mRNA so only contain exons not introns.
Only a local alignment could be used with these sequences
What is the standard format that programs require?
Fasta format
What is Fasta format?
> description of the sequence
The sequence on the subsequent lines
What is BLAST?
Basic local alignment search tool
What is blastn?
Search for a nucleotide using a nucleotide query
What is blastp?
Search for a protein using a protein query
What is blastx?
Search for a protein using a translated nucleotide query
What is tblastn?
Search for a translated nucleotide using a protein query
What is tblastx?
Search for a translated nucleotide using a translated nucleotide query
What are the 5 BLAST outputs?
Score Identities Positives Gaps E-Value
What does the score show in a BLAST output?
The matches - mismatches
What does the identity show in a BLAST output?
The number of identical residues
What does the positives show in a BLAST output?
The number of similar residues
What does the gaps show in a BLAST output?
The number of gaps introduced to give the best alignment
What does the E-Value show in a BLAST output?
The reliability of the alignment calculated by expected alignments and chance of alignments.
What is a good E-Value in a BLAST output?
A value less then 1e^-3
Define similar residues
Residues that have yet he same chemical and physical properties
Name 6 different properties of amino acids
Hydrophobic Aliphatic Aromatic Small Charged Polar
What is a protein domain?
A part of the protein structure that evolves function and can exist independently. They are between 25 and 500 residues long and appear in evolutionary related proteins
What are some example protein domain functions?
Ligand binding Spanning the plasma membrane Containing the catalytic site DNA- binding Surface to bind to other proteins
Give an example of a domain database
CDD
InterPro
What is a multiple sequence alignment?
It aligns several sequences
What is a multiple sequence alignment tool?
Clustal
What 2 algorithms are used in a multiple sequence alignment?
Position specific scoring matrices (PSSM)
Hidden Markov Model (HMM)
What is found in a Clustal output?
- entirely conserved column
: roughly the same size and hydrophobicity
. Conserved size or hydrophobicity
What does an output from Clustal of a good multiple sequence alignment contain?
10-30 residues
1-3 stars (*)
5-7 colons (:)
A few full stops (.)
Why are multiple sequence alignments useful?
To show sequence conservation, particularly domains
Identify a particular conserved residue
Determine secondary and tertiary structures
To build phylogenetic trees to show evolutionary origins
What are phylogenetic trees used for?
To construct an evolutionary relationship between species or sequences
What is a rooted phylogenetic tree?
Each node represents the most recent common ancestor. The line corresponds to time
What is an unroofed phylogenetic tree?
This makes assumptions about relatedness without ancestry. If an ancestor is identified the tree can be converted to a rooted tree
How are phylogenetic trees rooted?
Using an outgroup that is closely related to the groups but less closely related than the other groups are to each other.
The trees require related sequences or multiple sequence alignments
What 6 things can you find using bioinformatics?
Gene prediction Sequence analysis Protein structure prediction Epidemiology Microarray data analysis Metabolic pathway modelling