Bioinformatics Flashcards
interdisciplinary
field that combines biology, computer science,
statistics, mathematics, and engineering to analyze
and interpret biological data, particularly data from
large datasets like genomes or protein sequences
Bioinformatics
It is a widely-used format for
representing nucleotide or protein sequences.
FASTA
It consists of a header line starting with ‘>’, followed by the sequence data on subsequent lines.
FASTA
in sequence alignment, a ________ represents a position where one sequence has an insertion or
deletion relative to another sequence.
Gap
____________ are
introduced to optimize alignment and account for
evolutionary changes
Gap
___________ are
introduced to optimize alignment and account for
evolutionary changes.
Gap
It is the
sequence for which you are searching for similarities
or matches within a database
Query sequence
It’s the sequence you
are using as a reference
Query sequence
it is the
sequence(s) in a database against which the query
sequence is compared during sequence alignment or
similarity searches
Subject sequence
it is a branching
diagram that depicts the evolutionary relationships
among a set of organisms, genes, or species
Phylogenetic tree
It
shows the inferred evolutionary history and
relatedness based on genetic or sequence data
Phylogenetic tree
it is a
unique numerical identifier assigned to each
sequence entry in the NCBI (National Center for
Biotechnology Information) databases.
GI number
It provides a
stable and unique way to refer to a specific sequence
entry.
GI number
It is a
unique identifier assigned to a sequence record in a
public sequence database (like GenBank, EMBL, or
DDBJ)
Accession number
Typically consist of letters
and numbers and are used to reference specific
sequence entries.
Accession number
Involves
identifying and labeling the features of a genome such as genes, regulatory sequences, and other
functional elements.
Genome annotation
This process helps in
understanding the biological significance of the DNA
sequence.
Genome annotation
In sequence alignment or similarity searches, it is a numerical value that quantifies the level
of similarity or quality of alignment between two
sequences.
Score
Higher scores generally indicate more
significant similarity.(T or F)
TRUE
It is a statistical
measure that estimates the number of different
alignments with scores equivalent to or better than a
given score that would occur by chance in a database
search.
Expect value (E-value)
A ___________ indicates a more significant
match or similarity.
lower E-value
A field which uses computers to store and analyze
molecular biological information
BIOINFORMATICS
It is about finding and interpreting biological data
online
BIOINFORMATICS
It is a field in which biology, mathematics, statistics, computer
science, information technology, and other health sciences are
merged into a single discipline to process biological data
BIOINFORMATICS
It uses complex machines to read biological data at a much
faster rate than before.
BIOINFORMATICS
There is a marriage between biology and informatics. (T or F)
TRUE
The science of collecting and analyzing complex
biological data
BIOINFORMATICS
Allows the storage and management of large biological data sets
THE CREATION OF DATABASES
Data is being generated at a much greater pace than
its analysis (e.g. Human Genome Project)
THE CREATION OF DATABASES
These are repositories so it’s like a bank of biologic
information and are designed to collect, archive, visualize, and
organize biologic data.
Databases
This is to enable scientists to have an
intelligent data description, interpretation, or retrieval.
Databases
There is
much data that has been generated especially since the
completion of the
Human Genome Project
When was Human Genome Project launched?
1990s
Objective of human genome project
To sequence
the entire human genome which consists of about 3.2 billion
base pairs.
It was completed in 2003 because of this there’s a
large amount of data that have to be interpreted or analyzed.
Human Genome Project
Aside from the human genome, many other organisms were
completely sequenced. So there is again an enormous amount
of data that has to be understood that is why databases have
been created. (T or F)
TRUE
PRINCIPAL COMPONENTS OF BIOINFORMATICS
*THE CREATION OF DATABASES
*THE DEVELOPMENT OF ALGORITHMS AND STATISTICS
*THE USE OF THESE TOOLS FOR THE ANALYSIS AND
INTERPRETATION OF VARIOUS TYPES OF
BIOLOGICAL DATA
Determine relationships among members of large
data sets
THE DEVELOPMENT OF ALGORITHMS AND
STATISTICS
The large set of data are organized so that relationships can
be determined that is called
Algorithm
Algorithm is applied in ________
Statistics
including DNA, RNA and protein sequences, protein
structures, gene expression profiles, and biochemical
pathways
THE USE OF THESE TOOLS FOR THE ANALYSIS AND
INTERPRETATION OF VARIOUS TYPES OF
BIOLOGICAL DATA
Sciences that attempt to describe a living organism
in terms of ‘omics’
BRANCHES OF BIOINFORMATICS
BRANCHES OF BIOINFORMATICS
Genomics
Transcriptomics
Proteomics
Microbiomics
Metabolomics
IDENTIFY THE BRANCH OF BIOINFORMATICS
- involves the description of sequences of
the entire genome of an organism
Genomics
IDENTIFY THE BRANCH OF BIOINFORMATICS
study of all RNA molecules in a
living organism
Transcriptomics
IDENTIFY THE BRANCH OF BIOINFORMATICS
the description of the entire
complement of proteins in a living organism.
Proteomics