Intro To Bioinformatics Flashcards
What is bioinformatics?
The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatic processes in biotic systems
- Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline (NCBI, 2009)
- In bioinformatics, computer databases are used to store, retrieve and assist in understanding biological information
Give examples of biological data in bioinformatics
- DNA( genome)= sequence, pathway
- RNA(transcriptome )= structure, interaction
- Protein(proteome)= evolution, mutations
What can be used as bioinformatics in analysis of DNA?
- simple sequence analysis
- Gene finding
- regulatory regions
- whole genome annotations
- comparative genomics (analysis between species and strains)
What can be used as bioinformatics in analysis of RNA?
- Splice variants
- Tisssue specific expression
- structure
- single gene analysis (various cloning techniques)
- Experimental data involving thousands of genes simultaneously
- DNA chips, micro-arrays and expression array analysis
What can be used as bioinformatics in analysis of Proteins?
- homology
- conserved domains/regions
- structure determination(molecular modeling): 2D, 3D & quartenary structure
- protein function
- Analysis often involve 2D gels & Mass spectrometers
What are the major Nucleotide Sequence databases?
- GenBank: National Center for Biotechnology information
- GenBank is the NIH genetic sequence database which is part of the International Nucleotide Collaboration, it is comprised of the DNA Data Bank of Japan (DDBJ) , the European Molecular Biology Laboratory (EMBL), and gen bank at NCBI
- EMBL: European molecular biology laboratory
- The European Molecular Biology Laboratiry(EMBL), Nucleotide Sequence database is the European equivalent to the U.S.’s Gen Bank
- DDBJ: DNA data bank of Japan
- DNA data bank of Japan(DDDBJ) which is based in Japan’s National Institute of genetics, is the third in the trio of major nucleotide sequence databases
What are the protein major sequence databases?
- Uniprot: United protein database
- PIR: Protein Information Resource Databases
- Swiss-Prot
- ExPASY
Describe the Uniprot: United protein database
Uniprot is a single database that combines the information of the major international databases, European Bioinformatics Institute (EBI), Cambridge, UK; Protein Information Resource(PIR)-Georgetown university medical center(GUMC) & National Biochemical Research Foundation (NBRF), Washington, D.C.; and Swiss Institute of Bioinformatics (SIB) -Geneva, Switzerland
Describe the PIR: Protein Information Resource Databases
PIR grew out of the Atlas of Protein Sequence and Structure (1965- 1978) which was edited by Margaret Dayhoff
Describe the Swiss-Prot
Swiss-Pot is the major European protein sequence database, from from the Swiss institute of bioinformatics
Describe the ExPASY: Expert Protein Analysis System
-is the new Swiss Institute of Bioinformatics(SIB) Resource Portal which provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc
What is the Query Sequence ?
A sequence, either amino acid or nucleotide chosen by the user to use in a BLAST search
- A query sequence can be typed or pasted into the query window on the search form
- BLAST searches require a minimum query sequence length of 15 nucleotides or amino acids
- Query sequence can either be FASTA, Bare sequence or identifier (Accession number or gene info ID(gi)
What is an alignment ?
A presentation of two compared sequences showing the regions of greatest statistical similarity
What is the score value ?
The score value is a measure of the quality of the alignment between the query sequence and the search results
-the higher the score, the better the alignment
What is the E-value?
The E-value refers to the expectation value
- The number of different alignments with scores equivalent to or better than alignment scores that are expected to occur in a database search by chance
- The lower the E value, the better the match
What is genome annotation?
- Obtaining biological information from unprocessed sequence data
- The ultimate goal is to create a labeled genome, where biological information is linked to sequence
- There are two types structural and functional annotations