2: Introduction to Bioinformatics (FINALS) Flashcards
→ A field which uses computers to store and analyze molecular biological information
→ It is about finding and interpreting biological data online
→ Marriage between biology and informatics
→ Science of collecting and analyzing complex biological data
Bioinformatics
→ A field where biology, mathematics, statistics, computer science, information technology, and other health sciences are merged into a single discipline to process biological data
→ Uses complex machines to read biological data at a much faster rate than before
Bioinformatics
What are the 3 principal components of bioinformatics?
- Creation of Databases
- Development of Algorithms and Statistics
- The use of these tools for Analysis and Interpretation of various types of biological data
3 Principal Components of Bioinformatics:
→ are like repositories or banks of biologic informations and are designed to collect archive, visualize, and arrange biologic data
→ Allowed the storage and management of large biological data sets
→ Enable scientist to have an intelligent data description, interpretation or retrieval of dat
Databases
3 Principal Components of Bioinformatics:
T or F
Data is being generated at a much greater pace than its analysis
T
Example of bioinformatics:
→ made in the 1990’s
→objective is to sequence the entire human genome
→ Consist of about 3.2 billion base pairs
→ Finished in 2003
Human Genome Project
3 Principal Components of Bioinformatics:
Determine relationships among members of large data sets
Development of algorithms and statistics
3 Principal Components of Bioinformatics: Development of algorithms and statistics
Large set of data are organized so relationships can be determined
Algorithm
3 Principal Components of Bioinformatics:
A concept under this is biological data
The use of these tools for analysis and interpretation of various types of biological data
3 Principal Components of Bioinformatics: The use of these tools for analysis and interpretation of various types of biological data
→ Including DNA, RNA, and protein sequences, protein structures, gene expression profiles, and biochemical pathways
Biological data
Sciences that attempt to describe a living organism in terms of “omics”
Branches of Bioinformatics
Branches of Bioinformatics:
Involve the description of sequences of entire genome
Genomics
Branches of Bioinformatics:
Study of all RNA molecules in a living organism
Transcriptomics
Branches of Bioinformatics:
→ Description of the entire complement of proteins in a living organism
→ Entire proteins found in a living organism
→ Study of the Sequence, 3D Structures, and other Properties of all Proteins
Proteomics
Branches of Bioinformatics:
→ Pertains to microbes like viruses, fungi, parasites, bacteria
→ Genomes of microorganisms are described within a specific environmental niche?
Microbiomics
Branches of Bioinformatics:
→ Involved description of chemical processes involving metabolites
Metabolomics
Branches of Bioinformatics:
→ Pertains to microbes like viruses, fungi, parasites, bacteria
→ Genomes of microorganisms are described within a specific environmental niche?
Microbiomics
Familiarize the DNA/RNA Bioinformatics Applications
- Retrieving DNA sequences from databases
- Computing nucleotide compositions
Identifying restriction sites - Designing polymerase-chain reaction (PCR) primers
- Identifying open reading frames (ORF)
- Finding repeats
- Computing the optimal alignment between 2 or more DNA sequences
- Finding polymorphic sites in genes (SNPs)
- Assembling sequence fragments
Familiarize other applications in bioinformatics given
- Sequence alignment and analysis
- Mapping and analyzing DNA, RNA, Protein, Amino acid, and Lipid sequences
- Creation and Visualization of 3D structure models for biological molecules of significance
- Genome annotation
- Genetic diseases
- Designer medicine
Familiarize Applications in Various Fields
tignan niyo nlng sa ppt to huhuhu
Why do we use Bioninformatics?
Saves time when doing real experiment
Importance of Bioinformatics:
T or F
Study should end by simulated experiment on computer instead of a real environment
F (Study might START by simulated experiment on computer instead of a real environment)
Importance of Bioinformatics: Identify the process
Simulated experiment on computer = ?
Primer optimized and used in amplification reaction =
Simulated experiment on computer = In Silico
Primer optimized and used in amplification reaction = Wet Lab
Importance of Bioinformatics: Identify whether “In Silico” or “Wet Lab”
Target Identification
In Silico
Importance of Bioinformatics: Identify whether “In Silico” or “Wet Lab”
Primer Characterization
Wet lab
Importance of Bioinformatics: Identify whether “In Silico” or “Wet Lab”
Assay Optimization
Wet lab
DNA/RNA Bioinformatics Applications:
→ sequence with start codon (AUG), until a stop codon UAG, UGA, UAA
→ predicting elements of DNA/RNA secondary structure
open reading frames (ORF)
Three earliest DNA Sequences and Protein Databases?
- Nucleic acids
- Protein
- Other databases?
Three earliest DNA Sequences and Protein Databases:
What is the database for Nucleic acids
International Nucleotide Sequence Database
Three earliest DNA Sequences and Protein Databases:
Composition of International Nucleotide Sequence Database
- DDBJ (DNA DataBank of Japan)
- EMBL (European Molecular Biology Lab)
- GenBank (USA)
Three earliest DNA Sequences and Protein Databases:
What is the database for Protein?
Worldwide Protein Data Bank
Three earliest DNA Sequences and Protein Databases:
Familiarize the other databases
- Ensembl
- Human metabolome Database
- Gene Expression Databases
- Phenotypic Database
- RNA Databases
- Amino acid/protein Databases
- RNA Databases
- Protein-Protein and other Molecular Interactions
- Signal Transduction Pathway Databases
- Bacterial DNA Databases
T or F
In Gene Analysis Application, changes the sequence of the gene binge expressed always result to normal and healthy person
F (A DISEASE MAY ARISE due to changes the sequence of the gene binge expressed)
Gene Analysis Application
T or F
Sickle cell anemia results from point mutation of tyrosine to valine in beta-acid chain
F (substitution of GLUTAMIC ACID to VALINE)
This refers to genetic characteristics
Genotype
This refers to Physical Characteristics
Phenotype
Gene Analysis Application:
→ leads to sickle cell anemia
→ A recessive trait
Single Nucleotide mutation
Gene Analysis Application: Single Nucleotide mutation
Normal Sequence: G-A-G (Glutamic Acid)
Mutated: G-U-G (Valine)
Which amino acid became mutated?
A
Gene Analysis Application: Single Nucleotide mutation
If the Father and Mother are Heterozygous for sickle cell gene, how many are:
a. children who will manifest the disease
b. normal children
c. children who are carriers
a. ¼ of children develop sickle cell disease
b. ¼ are normal
c. ½ are carriers
What are the 2 Bioinformatic Actvities?
- Finding DNA/Protein Sequence
- Sequence Alignment
To find gene or protein sequences online, what websites should be used?
- Genbank
- Protein Data Bank
To find gene sequences online, what website should be used?
Genbank
To find protein sequences online, what website should be used?
Protein Data Bank
→ A way of rearranging sequences of DNA, RNA, or protein to identify regions of similarity
Sequence Alignment
Sequence Alignment:
What are the 2 factors involved where sequence alignment is made
Reference and Unknown Sequence
Sequence Alignment:
Reference sequence is also known as what?
Known, Subject sequence
Sequence Alignment:
Unknown sequence is also known as?
Query sequence
Sequence Alignment:
Familiarize Importance of identifying regions of similarity
- To understand functional, structural or evolutionary relationships between the sequences
- help identify dissimilar regions of the DNA sequence useful for designing primers
Sequence Alignment: Familiarize Importance of identifying regions of similarity
If sequences are similar, what does it mean?
they have similar functions or structure
Sequence Alignment: Familiarize Importance of identifying regions of similarity
Can either mean belonging to the same group or distant relationship
Evolutionary relationship
Sequence Alignment: Familiarize Importance of identifying regions of similarity
T or F
Identifying similar regions of DNA sequence is useful for designing primers
F (Identifying DISSIMILAR REGIONS of DNA sequence is useful for designing primers)
2 Types of Sequence Alignment
- Pairwise
- Multiple
Types of Sequence Alignment:
Compare two sequences
Pairwise
Types of Sequence Alignment:
Compare more than two sequences
Multiple
What are the websites used in pairwise sequence alignment?
- EMBOSS WATER
- BLAST
What are the websites used in multiple sequence alignment?
- MUSCLE
- MAFFT
- CLUSTAL Omega
What are the Types of Pairwise Sequence Alignments?
- Global alignment
- Local alignment
Types of Pairwise Sequence Alignments
→ Matching the residues (bases or amino acids) of two sequences across their entire length
→ The whole of DNA is aligned
Global Alignment
3 Types of Pairwise Sequence Alignments
→ Matching of two sequences from regions which have more similarity with each other
→ The two sequences may or may not be related
→ to see whether a substring (a part) in one sequence aligned well with substring (a part) in other sequence
Local alignment
Sequence Alignments
→ There multiple sequences being aligned
→ The residues are colored that differences can easily be seen
Multiple Sequence alignment: Clustal Omega
What type of Pairwise Sequence Alignments is appropriate for the given application:
Comparing two genes or proteins with the same function
Global Alignment
What type of Pairwise Sequence Alignments is appropriate for the given application:
Searching for local similarities in large sequences
Local alignment
What type of Pairwise Sequence Alignments is appropriate for the given application:
Looking for conserved domains of motifs in two proteins
Local Alignment
What type Sequence Alignments is appropriate for the given application:
Determines if all of the sequences are identical by presence of ASTERISK
Multiple Sequence alignment: Clustal Omega
T or F
In Multiple Sequence alignment: Clustal Omega, absence if asterisk means the sequence is similar
F (absence of asterisk means sequence is DISSIMILAR/VARIATION)
Pairwise Sequence Alignment: Emboss Water
Straight line indicates that the sequences are?
Similar
Pairwise Sequence Alignment: Emboss Water
Straight line indicates that the sequences are?
Similar
Pairwise Sequence Alignment: Emboss Water
Dot/period indicates that the sequences are?
Dissimilar
Pairwise Sequence Alignment: Emboss Water
Meaning of Y in the sequence?
Any pyrimidine
Pairwise Sequence Alignment: Emboss Water
Meaning of R in the sequence?
Any purine
Pairwise Sequence Alignment: Emboss Water
Meaning of N Ain the sequence?
Any bases
Pairwise Sequence Alignment: Website
→ Finds regions of local similarity between sequences
→ The amino acid sequences of proteins or nucleotide of DNA sequences
→ Compare a query sequence with a library or database of sequence
→ Identify library sequences that resemble the query sequence above certain threshold
→ To Identify uncharacterized genes
Basic Local Alignment Search Tool (BLAST)
Query, Database, Remarks of the program BLASTn?
Query: Nucleotide
Database: Nucleotide
Remarks: for high scoring matches
Query, Database, Remarks of the program BLASTp?
Query: Protein
Database: Protein
Remarks: uses substitution matrices
Query, Database, Remarks of the program BLASTx?
Query: Nucleotide (trans)
Database: Protein
Remarks: for novel DNA seqs and EST analysis
Query, Database, Remarks of the program TBLASTx?
Query: Protein
Database: Nucleotide
Remarks: for STS and EST assignments in databases
What are the 2 possible results of BLAST?
- Graphic Summary
- Sequences producing significant alignment
supply multiple sequences to be aligned to identity regions of similarity that may be a consequence of functional, structural, or evolutionary relationships
what program
Multiple Sequence Comparison by Log Expectation (MUSCLE)
Multiple Sequence Comparison by Log Expectation (MUSCLE):
T or F
Each sequence should have definition line preceded by the”>” greater than symbol
T
Multiple Sequence Comparison by Log Expectation (MUSCLE):
T or F
Choose part of sequence that is similar for primer
F (DO NOT choose part of sequences that is similar; dapat di sila similar bb)
What appropriate alignment sequence should be used? what website?
Aligning the envelope genes of the 4 dengue virus
Multiple Alignment Sequence: MUSCLE
BLAST or Multiple alignment
You supply one or more query sequences
BLAST
BLAST or Multiple alignment
Compares nucleotide or protein sequences to sequence databases
BLAST
BLAST or Multiple alignment
Uses to infer functional and evolutionary relationships between sequences
BLAST
parangboth
BLAST or Multiple alignment
Uses to infer functional and evolutionary relationships between sequences
BLAST (both dapat)
BLAST or Multiple alignment
Help identify members of gene families
BLAST
BLAST or Multiple alignment
You supply multiple sequences to be aligned to identity regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences
Multiple alignment
DEFINITION OF TERMS IN BIO INFORMATICS:
→ a text-based, bioinformatic data format used to store nucleotide or amino acid sequences (e.g. Deoxyribonucleic Acid [DNA] or Ribonucleic Acid [RNA]).
→ pronounced “Fast A” (“fast-aye”) because the name is a shortening of “FAST-All”.
FASTA
DEFINITION OF TERMS IN BIO INFORMATICS:
Can be present in one of the sequences wherein one or more amino acid residues have been deleted from the sequence
Gap
DEFINITION OF TERMS IN BIO INFORMATICS:
The input sequence that is being compared to others in the database aka sequence of interest
Query Sequence
DEFINITION OF TERMS IN BIO INFORMATICS:
The sequence you are comparing to
Subject Sequence
DEFINITION OF TERMS IN BIO INFORMATICS:
A diagram that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor
Phylogenetic tree
DEFINITION OF TERMS IN BIO INFORMATICS:
A series of digits that are assigned consecutively to each sequence record processed by NCBI
GI number
DEFINITION OF TERMS IN BIO INFORMATICS:
A unique identifier assigned to a record in sequence databases such as GenBank
Accession number
DEFINITION OF TERMS IN BIO INFORMATICS:
The process of deriving the structural and functional information of a protein or gene from a raw data set using different analysis, comparison, estimation, precision, and other mining techniques
Genome annotation
DEFINITION OF TERMS IN BIO INFORMATICS:
→ A set of values for qualifying the set of one residue being substituted by another in an alignment
→ Calculated by adding substitution scores, defined for each aligned pair of letters, and gap scores for each run of letters in one segment aligned with null characters inserted into the other
Score
DEFINITION OF TERMS IN BIO INFORMATICS:
A parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size
Expect Value
Full name of BLAST?
Basic Local Alignment Search Tool
Full name of MUSCLE?
Multiple Sequence Comparison by Log Expectation