lecture 4 What is Bioinformatics? Flashcards

Question 1

Q

What is Bioinformatics?

Answer

A

‘Bioinformatics is the acquisition, archiving, and interpretation (analysis) of molecular biology information’.

Bioinformatics is a multidisciplinary science at cross roads of Biology, Computer Science, Statistics, Mathematical Modelling, Systems Science

Question 2

Q

what is acquisition in regard to bioinformatics?

Answer

A

Acquisition (Analytical Platforms):
-DNA, RNA, Protein Sequence
-Metabolites
-Molecular Structures

Question 3

Q

what is archiving in regard to bioinformatics?

Answer

A

Archiving (Biological Databases):
-DNA, RNA, Protein Sequence
-Metabolites
-Molecular Structures

Question 4

Q

what is interpretation in regard to bioinformatics?

Answer

A

Interpretation (Data Analysis):
-Computational Genomics
-Gene Function Annotation
-Molecular Pathway Annoation
-And more …

Question 5

Q

what is a Biological Databases

Answer

A

A large, persistent collection of systematically organised data, managed by a software that can retrieve and update records

Biological databases are central to many bioinformatics applications.

Biological databases provide the opportunity to access and systematically search a wide variety of biological data for an increasingly broad range of organisms

Types of biological data include
-Genomic, transcriptomic, and protein sequences.
-Genomic annotation, e.g. genes, transcription factors binding sites, gene function, pathways
-Phenotypes
-Protein Structure
-And more…

Question 6

Q

Question 7

Q

what are some Key concepts of biological databases

Answer

A

To be easily identifiable ALL records (gene/protein/metabolite names, sequences, data etc) needs to have an UNIQUE IDENTIFIER aka ACCESSION NUMBER

In your analyses/reports you will need to cite the unique identifier/accession number so the reader knows which data/organism/ gene (etc) you are really working with

An identifier unambiguously identify a biological entity

To be easily understood it is a good idea to have information presented in a FIXED FORMAT/VOCABULARY. This helps us (and computers) to read and extract the information we need.

In molecular biology there are two sequence formats GENBANK and FASTA that are frequently used.
FASTA is a de facto standard for any raw sequence.
GENBANK is the flat file format for gene sequences

Question 8

Q

Question: The sequence of a gene was updated in the Entrez database. What will happen to the gene identifier?

Answer

A

It remains unchanged.

Not all databases deal with versions the same way. Ensembl will append the version number to the id, such that the id is formed of two components:
the gene identifier and
the version of the gene.

Question 9

Q

How do I choose the right biological data(base)?

Answer

A

1.Type of Biological System
+Cell culture
*Animal model
-Human

2.Level of organisation
+Organeles
*Single Cells
-Tissues

3.Scope, depth and breadth of coverage
*Biased or partial, e.g. Candidate gene
-Comprehensive, e.g. Omics data

4.Genesis
+Computational predictions
*Experimental data

5.Levels of Curation
+Raw/archival data, e.g. SRA
-Curated data, e.g. RefSeq

6.Types of Curation
+Computationaly curated, e.g. UniProt
*Community curated, e.g. GO
-Expert reviewed, e.g. RefSeq

Question 10

Q

Summary

Answer

A

Biological databases store different types of biological data, e.g. sequence, bibliography, graphs, etc.

Two key concepts allows storage, sharing and unambiguous interpretation of data:
1.Unique Identifiers
2.Fixed Formats and Vocabularies

Biological databases can be characterised by six attributes:
1-Biological System
2-Level of Organisation
3-Scope and Coverage
4-Genesis
5-Curation
6-Types of Curation

Question 11

Q

Biological databases covered in this lecture

Answer

A

PubMed – Bibliographic database

GenBank – Gene-centric sequence database

UniProt – Integrative portal focused on protein data

Gene Ontology – Gene function database

KEGG – Metabolic pathways database

Question 12

Q

Summary of characteristics of bibliographical DB

Answer

A

Credibility of a source: For a journal being indexed in medline, scopus, pubmed and web of science requires meeting vigorous review and selection criteria.

Convenience: Find information in all journals by doing a single search

Permanency: PUBMED provides unique identifiers for each paper and a permanent URL for sharing and citing unambiguously.

Impact: Web of science and Scopus provides number of citations of Papers and Impact Factor of Journals (Number of citations / year)

Question 13

Q

Summary of banks

Answer

A

GenBank is a flat file database with the following characteristics:
-Human readable format (GenBank)
-Archival in nature
-Reflective of submiters point of view (subjective)
-Redundant (multiple copies)

UniProt is a protein-focused database consisting of the combined databases -SwissProt, TrEMBL.
SwissProt is manually annotated and reviewed
-TrEMBL are automatically annotated and not reviewed

Fasta is a machine interpretable format that consists of:
-A greater than symbol “>” for every new entry, with a unique identifier (name) and
-The sequence on the following line