Bioinformatics Lecture Flashcards

Question 1

Q

Why do we need bioinformatics ?

Answer

A

Managing biological data - sequencing technologies generate vast amounts of biological data that require computational tools for storage, analysis, and interpretation

Genomics and proteomics analysis - helps analyse DNA, RNA, and protein sequences, aiding in understanding gene function

Biomedical applications - supports disease research, drug discovery, and personalised medicine by identifying biomarkers, genetic variations, and drug targets

Systems biology - integrates multi-omics data to understand complex biological processes

Question 2

Q

Define sequence and describe sequencing processing:

Answer

A

Sequence is a string of nucleotides or amino acid codes

Format - arrangement of data for computer input/ output

Downstream - towards the 3´ end of a nucleotide sequence
Upstream - towards the 5´end of a nucleotide sequence

Redundancy - if a database is described as being redundant its because a sequence may be found several times

Sequence processing:

raw data acquisition
quality control by removing sequencing errors
preprocessing (trimming of low quality reads + adapter removal)
alignment (mapping sequences to reference genome)
variant calling identifies mutations
annotation by functional analysis

Question 3

Q

Describe genomic data repositories:

Answer

A

Genomic Data Repositories:
GenBank (NCBI): Public database for nucleotide sequences.
Ensembl (EBI): Genome browser providing annotated genomes.
European Nucleotide Archive (ENA): Stores raw sequencing data.
Protein Data Repositories:
UniProt: Comprehensive protein sequence and functional information.
Protein Data Bank (PDB): 3D structures of proteins, nucleic acids.
Biomedical & Variant Databases:
ClinVar: Links genetic variations to clinical conditions.
GWAS Catalog: Repository of genome-wide association study findings.
dbSNP: Repository for single nucleotide polymorphisms.

Question 4

Q

Explain the basic concepts associated with bioinformatics and how it is
used in biomedical research

Answer

A

Biological Databases: Storage and retrieval of sequence, structure, and function-related data.

Computational Algorithms: Used for sequence alignment, genome assembly, and structure prediction.

Statistical & Machine Learning Methods: Applied to detect patterns, predict protein function, and classify diseases.

Systems Biology: Integration of multi-omics data to understand biological pathways

Biomedical research:

Disease Diagnostics: Identification of genetic variants linked to diseases.

Drug Discovery: Target identification, molecular docking, and drug repurposing.

Personalised Medicine: Tailoring treatments based on an individual’s genetic profile

Question 5

Q

Describe alignment, homology and similarity:

Answer

A

Alignment - 1 to 1 matching of 2 or more sequences, so each character in a pair of sequences is associated with a single character of their sequence or with a null character

Alignment enables the researcher to determine if two sequences display sufficient similarity to justify the inference of homology

Homology is a conclusion drawn from this data (similarity %) that the 2 genes share a common evolutionary history

Genes are either homologous or not homologous
There are no degrees of homology as there are in similarity

Question 6

Q

Describe the approaches for sequence alignment:

Answer

A

Pairwise alignment - comparing 2 sequences using global or local alignment

Multiple sequence alignment - aligning multiple sequences simultaneously

Reference based - Aligning sequences to a known reference genome

De Novo Assembly: Constructing sequences without a reference

Question 7

Q

Describe the difference between local and global sequence alignment:

Answer

A

Local alignment:

stretches of sequences with the highest density of matches are aligned
finds the highest scoring alignment regardless of position and length
suitable for detecting conserved domains, motifs, or partial homologs

Global alignment:

used to align the entire sequence using as many characters as possible
finds the optimal alignment over the entire length of the sequences
best for similar length sequences with high similarity

Question 8

Q

What are some tools used for sequence alignment ?

Answer

A

BLAST:

Basic Local Alignment Search Tool
tries to find the best alignment between your entire query and an entire database sequence
Identifies homologous sequences
characteristics: local alignments, rapid, heuristic

FASTA:

tries to find regional similarity between the
entire query and the entire database sequence
Alternative to BLAST for sequence searching (faster)

Genome alignment:

BWA (Burrows-Wheeler Aligner): Fast alignment for large genomes.
Bowtie: High-speed short-read aligner

Question 9

Q

Describe multiple alignment and its usage:

Answer

A

Aligning three or more sequences to identify conserved regions, evolutionary relationships, and functional motifs

Usages:

Identification of functionally important sites
Demonstration of homology between sequences
Molecular phylogeny
Search for weak but significant similarities in
sequence databases
Structure prediction
Function prediction
Design of primers for PCR (polymerase chain
reaction)
Identification of related genes

Question 10

Q

Evaluate the main bioinformatics online resources containing data and information and tools for analysis

Question 11

Q

Describe translational bioinformatics methods and tools:

Question 12

Q

Describe Clustal W with relation to multiple alignment:

Answer

A

Clustal W is a widely used algorithm for Multiple Sequence Alignment (MSA) of nucleotide or protein sequence

It aligns multiple sequences by identifying conserved regions, evolutionary relationships, and functional domains

Mechanism:

calculate a matrix of pairwise distances based on pairwise alignments between the sequences
Use the result of A to build a guide tree, which is an inferred phylogeny for the sequences
Use the tree from B to guide the progressive alignment of the sequences