Theory for bioinformatics Flashcards
What are databases
It is a collection of information stored in a computer medium that can be easily accessed and manipulated.
It is an electronic filling system.
What is a biological database
Collection of biological data
What are the uses of database?
Handle, share and manage large volumes of biological data.
Store Maintain Enter Search Sort Retrieve and Analyze Present or Display
What are the 3 general functions of databases?
Support large-scale analysis efforts
Make data access easy and updated
Use knowledge from various fields of biology and medicine.
Fields
A particular data about a person or thing stored in a database. Usually, it is the column of the table.
Field type/attribute
The properties of a data of a particular field such as strings, numeric, date etc
Record and entries
A set of fields within a table that are relevant to a specific entity.
Primary key
Unique identifier of a record or a field unique to the record.
Secondary key
A field used to link up a table with other tables in a database.
Relational database
A complex database with many tables and linked by different secondary keys.
Choose the primary key in a relational database
Choose a key that has items that are not duplicated.
A primary key can also be a secondary key if it links up another key
What are bibliographic databases?
They contain scientific literature
Examples: Pubmed and ScienceDirect
What are taxonomic databases?
They contain classification
Examples: Integration taxonomic information system itis.gov, Biodiversity information standards tdwg.org and ncbi taxonomy.
What are nucleic acid databases?
They contain DNA information
Examples: NDB, EMBL-Bank, GenBank, DDBJ
What are genomic databases?
They contain genome level information
Examples: Ensembl Genome Browser, UCSC Genome Browser, WormBase, AceDB, Comprehensive Microbial Resource and FlyBase
What are protein databases
They contain protein information
What are protein families, domains and functional sites databases
They contain classification of proteins and identifying domains information.
What are enzymes/metabolic pathways database
They contain information on metabolic pathways information.
Binding site and promoters of genes databases
Genes are regulated by promoters that can turn on or off a gene, regulate how much a gene product is made, usually found ‘upstream’ of a gene
DBTBS, EPD, PromEC, TRANSFAC
Examples of protein databases
InterPro: Protein families and domains
EXprot: proteins with experimentally verified functions
Protein Information Resource (PIR)
SWISS-PROT/TrEMBL curated protein sequence
Examples of protein motifs and domains databases
Proteins have conserved regions(motifs) which may have functional significance
These databases store protein families, motifs and structural domains.
BLOCKS (multiple alignments of conserved regions) CDD eMOTIF Pfam PRINTS ProDom PROSITE ProtoMAP
Protein structure databases
Proteins take on a 3-D structure
3-D data for some proteins is available due to techniques such as NMR and X-ray crystallography.
ASTRAL
PDB
SCOP
MMDB
Disease databases
ONIM
OMIA
HGMD
Tumor Gene Family databases
Many databases are actually interconnected with each other
Yep
Entrez from NCBI
It is a search and retrieval system that links to nucleotide, protein and literature information.
SRS from EBI
It is a search and retrieval system that links 168 databases for searching through a single database.
3 types of databases
Primary databases - experimental results
Secondary databases - analysis of experimental results
Composite databases - aggregate of many databases. -> Links to other data items
-> Combination of data
-> Consolidation of data.
Where do information in primary databases come from?
- > Original biological data such as protein or nucleic acid
- > From experiments
- > From literature and patents
- > Primary sequence data
- > Literature databases
Describe secondary databases
- > Curated and annotated
- > Add value to primary database
- > List structural/functional motifs
Describe composite databases
- > Links to other data items
- > Combination of data
- > Consolidation of data
Examples of primary databases
Nucleic acid
- EMBL
- GenBank
- DDBJ
Protein
- SWISS-PROT
- TREMBL
- PIR
Common database search methods
Keyword matching, sequence similarity, motif searching and class searching
Problems with using biological databases
Incomplete information Data spread over multiple databases Redundant information Various errors Sometimes incorrect links Constant change
Retrieval system
Help retrieve rich information from multiple databases