Bioinformatics Flashcards
Define bioinformatics
the application of computers to problems in biology
Define bioinformatics
the application of computers to problems in biology
What is the aim of bioinformatics?
Based on known protein structure and function, to enable understanding and modulation of protein function
How often does DNA data double?
every ~18 months
How often does structure data double?
every ~6 years
How big is the human genome?
3.2 Gbp
What percentage of the human genome is coding?
What percentage of the human genome is repeated sequences?
> 50%
What percentage of genes have alternative splicing?
~35%
What is a database?
a structured collection of data with some tool enabling it to be ‘queried’
What is a databank?
a collection of data (normally in simple text files) without an associated query tool
What types of databank are there?
primary, secondary and meta-databanks
What is a primary databank?
- simply contain sequence data (DNA or protein)
- may also have ‘feature’ information (splice sites, signal sequences, disulphides, actives sites, etc.)
- DNA databanks may also contain translations (known or predicted
What is a meta-databank?
collections of links between databanks and databases
Give some examples of primary databanks
Genbank, EMBL, DDBJ, UniProtKB/SwissProt, PIR, PDB, Enzyme
What is the implication of imperfect gene-prediction methods?
a protein identified from genome data is hypothetical until verified by experiment
What information is found in PDB?
structural data
What information is found in Enzyme?
enzyme classifications (EC numbers)
What is a secondary databank?
- these contain derived information
- patterns that characterise a protein family
- detailed annotation
What is a secondary databank?
- these contain derived information
- patterns that characterise a protein family
- detailed annotation
What is the aim of bioinformatics?
Based on known protein structure and function, to enable understanding and modulation of protein function
How often does DNA data double?
every ~18 months
How often does structure data double?
every ~6 years
What is the meaning of some characters and symbols used in PROSITE?
- the standard IUPAC one letter code for the amino acids is used
- the symbol ‘x’ is used for a position where any amino acid is accepted
- [ALT] stands for Ala or Leu or Thr
- {AM} stabds for any amino acid except Ala and Met
- each element in a pattern is separated from its neighbour by ‘-‘
- x(3) corresponds to x-x-x
- x(2,4) corresponds to x-x or x-x-x or x-x-x-x
What is a dotplot
a graphical method that allows the comparison of two biological sequences and identification of regions of close similarity between them
What is similarity measure?
Similarity matrices are used to align sequences of nucleic acids or amino acids