Bioinformatics lecture Flashcards
The extent to which two sequences are the same
Identity
Lining up two or more sequences to search for the maximal regions of identity in order to assess the extent of biological relatedness of homology
Alignment
The relatedness of sequences
Similarity
A fixed set of commands in a computer program
Algorithm
A space introduced in alignment to compensate for insertions or deletions in one of the sequences being
compared
Gap
Similarity attributed to descent from a common ancestor
Homology
The sequence presented for comparison with all other sequences in a selected database.
Query
The genetic sequence database sponsored by the National Institutes of Health.
GenBank
describes the number of matches
to the query by chance when searching a database of a
particular size.
E- value (Expect value)
study on evolutionary relatedness among species by comparing homologies and differences in gene
sequences
Phylogenetics
- A field which uses computers to store and analyze
molecular biological information. - application of tools of computation and analysis to the capture and interpretation of biological data.
BIOINFORMATICS
- Allow the storage and management of large biological data sets
- Data is being generated at a much greater pace than its
analysis (Human Genome Project)
CREATIO OF DATABASES
Determine relationships among members of large data users
DEVELOPMENT OF ALGORITHMS AND STATISTICS
- Transcriptomics
- Microbiomics
- Metabolomics
- Genomics
- Proteomics
BRANCHES OF BIOINFORMATICS
- Retrieving DNA sequences from databases
- Computing nucleotide compositions
- Identifying restriction sites
- Designing polymerase chain reaction (PCR) primers
- Identifying open reading frames (ORFs)
- Predicting elements of DNA/RNA secondary structure
- Finding repeats
- Computing the optimal alignment between two or more DNA
sequences - Finding polymorphic sites in genes (single nucleotide
polymorphisms, SNPs) - Assembling sequence fragments
- Creation and visualization of 3D structure models for
biological molecules of significance.
BIOINFORMATICS APPLICATIONS
- Microbial genome applications
- Molecular medicine
- Personalized medicine
- Gene therapy
- Drug development
- Antibiotic resistance
- Evolutionary studies
- Waste cleanup
- Biotechnology
- Climate change studies
- Alternative energy sources
- Crop improvement
- Forensic analysis
- Bio-weapon creation
- Insect resistance
- Improve nutritional quality
- Veterinary science
BIOINFORMATICS APPLICATIONS IN VARIOUS FIELDS
THREE EARLIEST DNA SEQUENCE AND PROTEIN DATABASES
- DDBJ (DNA DataBank of Japan)
- EMBL (European Molecular Biology Lab)
- Genbank (USA)
- Contain original data in the form of primary sequence data
or structural data as submitted by the scientific community. - Examples: GenBank, EMBL, DDBJ, SWISS-PROT and PIR
PRIMARY DATABASES
Contain information that has been
process and derived from the raw data available in primary
database
SECONDARY DATABASES
- A way of rearranging sequences of DNA, RNA or protein to identify regions of similarity.
SEQUENCE ALIGNMENT
To understand functional, structural, or
evolutionary relationships between the sequences
identify regions of similarity
TYPES OF SEQUENCE ALIGNMENT
- Pairwise - compare two sequences
- Multiple- compare more than two sequences
compare more than two sequences
o MUSCLE
o MAFFT
o CLUSTAL Omega
Multiple
compare two sequences
o EMBOSS WATER
o BLAST
Pairwise
Matching the residues (bases or amino
acids) of two sequences across their entire length
o matches the identical sequences
o The two sequences are treated as potentially
equivalent
Global alignment
matching of two sequences from regions
which have more similarity with each other
o The two sequences may or may not be related
o Purpose
▪ To see whether a substring (a part) in one
sequence aligns well with a substring (a
part) in the other sequence
o Applications:
▪ Searching for local similarities in large
sequences (e.g., newly sequenced
genomes)
▪ Looking for conserved domains of motifs
in two proteins
Local alignment
multiple sequence alignment tool
that arranges the sequences of DNA, RNA or protein to identify regions of similarity
MUSCLE
- finds regions of local similarity between sequences.
- the amino acid sequences of proteins or the nucleotides of
DNA sequences. - compare a query sequence with a library or database of
sequences, and identify library sequences that resemble the
query sequence above a certain threshold. - can be used to infer functional and evolutionary
relationships between sequences as well as help identify
members of gene families.
- BLAST
BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST)
- In BLAST
o you supply one or more query sequences and it
compares nucleotide or protein sequences to
sequence databases - In a multiple alignment
o you supply multiple sequences to be aligned to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences
BLAST DIFFER FROM MULTIPLE SEQUENCE ALIGNMENT
you supply one or more query sequences and it compares nucleotide or protein sequences to
sequence databases
BLAST
you supply multiple sequences to be aligned to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences
multiple alignment