Week 8 (Lecture 14) - Databases Flashcards
Database
a structured collection of data held in computer storage
• especially one that incorporates software to make it accessible in a variety of ways
• any large collection of information
Database management
the organization and manipulation of data in a database
Database management systems (DBMS)
a software package that provides all the functions required for database management
Database system
a database together with a database management system
What is a database?
a collection of data…
structured • searchable (index) --> table of contents • updated periodically (release) --> new edition • cross-referenced (hyperlinks) --> links with other databases
A database includes
tools (software) necessary for • access • updating • information insertion • information deletion etc
Database storage management
- flat files
* relational databases
Flat file
- various means to encode a database model (most commonly a table) as a single file
- can be a plain text file or a binary file
- usually no structural relationships between the records
Relational database
- a database that has a collection of tables of data items, all of which is formally described and organized according to the relational model
- data in a single table represents a relation
- tables may have additionally defined relationships with each other
Why biological databases?
• exponential growth in biological data • data are no longer published in a conventional manner, but directly submitted to databases -- genomic sequences -- 3D structures -- 2D gel analysis -- MS analysis -- microarrays
• essential tools for biological research
– the only way to publish massive amounts of data without using all the paper in the world
The first database that emerged concentrated on
collecting and annotating nucleotide and protein sequences generated by the early sequencing techniques
Number of different biological databases
more than 1000
Size of databasess - variable
< 100 Kb to >20Gb
• DNA: >20 Gb
• protein: 1 Gb
• 3D structure: 5 Gb
Update frequency
daily to annually to seldom to forget about it
• usually accessible through the web
Some databases in the field of molecular biology
- AATDB
- AceDB
- ACUTS
- ADB
- AFDB
Categories of databases for life sciences
- sequence (DNA, protein)
- genomics
- mutation/polymorphism
- protein domain/family
- proteomics (2D gel, mass spectrometry)
- 3D structure
- metabolic networks
- regulatory networks
- bibliography
- expression (microarrays…)
- specialized
NCBI
GenBank is maintained at the National Center for Biotechnology Information
• Maryland, USA
EMBL
European Molecular Biology Laboratory
• at the European Bioinformatics Institute
• Cambridge, UK
DDBJ
DNA Databank of Japan
• at National Institute of Genetics
• Mishima, Japan
Objectives of these databases
EMBL, GenBank, DDBJ
- to ensure that DNA sequence information is stored in a way that is PUBLICLY and FREELY accessible
- and can be retrieved and used by other researchers in the future
Literature databases
- Bookshelf
- PubMed
- PubMed Central
- OMIM