Intro: Flow of Genetic Information Flashcards
Intro to Bioinformatics
How does genetic information flow?
DNA to DNA - replication
DNA to mRNA - transcription
mRNA to protein - translation
merging of biology, computer science, and information technology
bioinformatics
use of computer to gather, store, analyze, and integrate biological and genetic information
bioinformatics
an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics for analysis, exploration, integration, and exploitation of biological data
bioinformatics
examples of biological data
DNA sequences
Amino acid sequences
Protein structure
Omics data
Goals of Bioinformatics
enable discovery of new biological insights
create global perspective
first protein sequence database was in ________________
Atlas of protein sequence and structure
when was the first protein sequence database recorded?
1970
DNA sequences accumulate in literature and protein sequence become common; shifted from protein to DNA
mid-1970-80
What is the first nucleotide database?
GenBank
when was GenBank established
1980
Protein information resource was established
1984
parallel advances in biology and computer science; bioinformatics online
1990s
genomics era
2000s
large biological data (omics)
present
first sequence of a protein
insulin
when was insulin published?
1950
the issue was not sequencing a protein in itself but rather assembling the whole protein sequence from hundreds of small edman peptide sequences
Edman degradation method (1987)
pioneered application of computational method to field of biochemistry
Margaret Dayhoff
developed COMPROTEIN
Margaret Dayhoff
complete computer program for the IBM 7090 designed to determine protein primary structure using Edman peptide sequencing data
COMPROTEIN
this software, entirely coded in FORTRAN on punch-cards, is the first occurrence of what we would call today as de novo sequence assembler
COMPROTEIN
COMPROTEIN was coded in ________
FORTRAN
first bioinformatics software
COMPROTEIN
investigated biomolecular sequences as carriers of information
Emile Zuckerkandl and Linus Pauling
introduced Paleogenetics
Emile Zuckerkandl and Linus Pauling
developed first dynamic programming algorithm for pairwise protein sequence alignments
Needleman and Wunsch
development of the first probabilistic model of amino acids substitution
1978
developed the groundworks for gene cloning
Berg
developed polymerase chain reaction
mullis
promoted free software philosophy; free Unix based operating system called GNU
Richard Stallman
advancement of new programming languages
Perl and Phyton
initiated World Wide Web
Tim Berner’s Lee
Framework of Bioinformatics
Collect statistics from biological data
build computational model
solve computational model program
test and evaluate a computational algorithm
3 main components
Data
Database
Data mining tools
obtained from gene and genomic sequencing
nucleic acid sequence
sequence is represented by DNA alphabet
nucleic acid sequence
can be obtained from protein sequencing and/or predicted from DNA sequence
Amino acid sequence
sequence is represented by one letter amino acid sequence
amino acid sequence
obtained from transcriptomic studies
Gene expression data
large, organized body of persistent data
biological database
usually associated with computerized software
biological database
designed to update, query, and retrieve components of the data stored within the system
biological database
3 main purpose of biological database
stores biological data in computer-readable form
stored data is accessed efficiently
available to research community in a single place
contains raw information of the sequence alone
primary database
contains derived information from the analysis of primary data
secondary database
amalgamates a variety of different database sources, which obviates the need to search multiple sources
composite database
search engine and analysis tool for large biological data
data mining tools
process of biological data utilization
database mining
Applications of Bioinformatics
Sequence analysis
Phylogenetic analysis
prediction of protein secondary structure
Protein 3D structure prediction
next generation sequencing data analysis
deals with biological data, their collection, curation, distribution and analaysis
bioinformatics
unit of distribution of a collection of some type of biological information
database
archive of information
database
logical organization or structure of that information called schema
database
these contain information collected from archival databases and inferred from analyses of their contents
derived databases
characteristic signature patterns of families of proteins
sequence motifs
connections between, and common features of, entries in archives
classifications or relationships
scientific literature itself is data
bibliographic database