Bioinformatics Lecture Flashcards

1
Q

Why do we need bioinformatics ?

A

Managing biological data - sequencing technologies generate vast amounts of biological data that require computational tools for storage, analysis, and interpretation

Genomics and proteomics analysis - helps analyse DNA, RNA, and protein sequences, aiding in understanding gene function

Biomedical applications - supports disease research, drug discovery, and personalised medicine by identifying biomarkers, genetic variations, and drug targets

Systems biology - integrates multi-omics data to understand complex biological processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define sequence and describe sequencing processing:

A

Sequence is a string of nucleotides or amino acid codes

Format - arrangement of data for computer input/ output

Downstream - towards the 3´ end of a nucleotide sequence
Upstream - towards the 5´end of a nucleotide sequence

Redundancy - if a database is described as being redundant its because a sequence may be found several times

Sequence processing:

  • raw data acquisition
  • quality control by removing sequencing errors
  • preprocessing (trimming of low quality reads + adapter removal)
  • alignment (mapping sequences to reference genome)
  • variant calling identifies mutations
  • annotation by functional analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe genomic data repositories:

A

Genomic Data Repositories:
GenBank (NCBI): Public database for nucleotide sequences.
Ensembl (EBI): Genome browser providing annotated genomes.
European Nucleotide Archive (ENA): Stores raw sequencing data.
Protein Data Repositories:
UniProt: Comprehensive protein sequence and functional information.
Protein Data Bank (PDB): 3D structures of proteins, nucleic acids.
Biomedical & Variant Databases:
ClinVar: Links genetic variations to clinical conditions.
GWAS Catalog: Repository of genome-wide association study findings.
dbSNP: Repository for single nucleotide polymorphisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the basic concepts associated with bioinformatics and how it is
used in biomedical research

A

Biological Databases: Storage and retrieval of sequence, structure, and function-related data.

Computational Algorithms: Used for sequence alignment, genome assembly, and structure prediction.

Statistical & Machine Learning Methods: Applied to detect patterns, predict protein function, and classify diseases.

Systems Biology: Integration of multi-omics data to understand biological pathways

Biomedical research:

Disease Diagnostics: Identification of genetic variants linked to diseases.

Drug Discovery: Target identification, molecular docking, and drug repurposing.

Personalised Medicine: Tailoring treatments based on an individual’s genetic profile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe alignment, homology and similarity:

A

Alignment - 1 to 1 matching of 2 or more sequences, so each character in a pair of sequences is associated with a single character of their sequence or with a null character

Alignment enables the researcher to determine if two sequences display sufficient similarity to justify the inference of homology

Homology is a conclusion drawn from this data (similarity %) that the 2 genes share a common evolutionary history

Genes are either homologous or not homologous
There are no degrees of homology as there are in similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the approaches for sequence alignment:

A

Pairwise alignment - comparing 2 sequences using global or local alignment

Multiple sequence alignment - aligning multiple sequences simultaneously

Reference based - Aligning sequences to a known reference genome

De Novo Assembly: Constructing sequences without a reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the difference between local and global sequence alignment:

A

Local alignment:

  • stretches of sequences with the highest density of matches are aligned
  • finds the highest scoring alignment regardless of position and length
  • suitable for detecting conserved domains, motifs, or partial homologs

Global alignment:

  • used to align the entire sequence using as many characters as possible
  • finds the optimal alignment over the entire length of the sequences
  • best for similar length sequences with high similarity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some tools used for sequence alignment ?

A

BLAST:

  • Basic Local Alignment Search Tool
  • tries to find the best alignment between your entire query and an entire database sequence
  • Identifies homologous sequences
  • characteristics: local alignments, rapid, heuristic

FASTA:

  • tries to find regional similarity between the
    entire query and the entire database sequence
  • Alternative to BLAST for sequence searching (faster)

Genome alignment:

BWA (Burrows-Wheeler Aligner): Fast alignment for large genomes.
Bowtie: High-speed short-read aligner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe multiple alignment and its usage:

A

Aligning three or more sequences to identify conserved regions, evolutionary relationships, and functional motifs

Usages:

  • Identification of functionally important sites
  • Demonstration of homology between sequences
  • Molecular phylogeny
  • Search for weak but significant similarities in
    sequence databases
  • Structure prediction
  • Function prediction
  • Design of primers for PCR (polymerase chain
    reaction)
  • Identification of related genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Evaluate the main bioinformatics online resources containing data and information and tools for analysis

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe translational bioinformatics methods and tools:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe Clustal W with relation to multiple alignment:

A

Clustal W is a widely used algorithm for Multiple Sequence Alignment (MSA) of nucleotide or protein sequence

It aligns multiple sequences by identifying conserved regions, evolutionary relationships, and functional domains

Mechanism:

  • calculate a matrix of pairwise distances based on pairwise alignments between the sequences
  • Use the result of A to build a guide tree, which is an inferred phylogeny for the sequences
  • Use the tree from B to guide the progressive alignment of the sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly