Bioinformatics Lecture Flashcards
Why do we need bioinformatics ?
Managing biological data - sequencing technologies generate vast amounts of biological data that require computational tools for storage, analysis, and interpretation
Genomics and proteomics analysis - helps analyse DNA, RNA, and protein sequences, aiding in understanding gene function
Biomedical applications - supports disease research, drug discovery, and personalised medicine by identifying biomarkers, genetic variations, and drug targets
Systems biology - integrates multi-omics data to understand complex biological processes
Define sequence and describe sequencing processing:
Sequence is a string of nucleotides or amino acid codes
Format - arrangement of data for computer input/ output
Downstream - towards the 3´ end of a nucleotide sequence
Upstream - towards the 5´end of a nucleotide sequence
Redundancy - if a database is described as being redundant its because a sequence may be found several times
Sequence processing:
- raw data acquisition
- quality control by removing sequencing errors
- preprocessing (trimming of low quality reads + adapter removal)
- alignment (mapping sequences to reference genome)
- variant calling identifies mutations
- annotation by functional analysis
Describe genomic data repositories:
Genomic Data Repositories:
GenBank (NCBI): Public database for nucleotide sequences.
Ensembl (EBI): Genome browser providing annotated genomes.
European Nucleotide Archive (ENA): Stores raw sequencing data.
Protein Data Repositories:
UniProt: Comprehensive protein sequence and functional information.
Protein Data Bank (PDB): 3D structures of proteins, nucleic acids.
Biomedical & Variant Databases:
ClinVar: Links genetic variations to clinical conditions.
GWAS Catalog: Repository of genome-wide association study findings.
dbSNP: Repository for single nucleotide polymorphisms.
Explain the basic concepts associated with bioinformatics and how it is
used in biomedical research
Biological Databases: Storage and retrieval of sequence, structure, and function-related data.
Computational Algorithms: Used for sequence alignment, genome assembly, and structure prediction.
Statistical & Machine Learning Methods: Applied to detect patterns, predict protein function, and classify diseases.
Systems Biology: Integration of multi-omics data to understand biological pathways
Biomedical research:
Disease Diagnostics: Identification of genetic variants linked to diseases.
Drug Discovery: Target identification, molecular docking, and drug repurposing.
Personalised Medicine: Tailoring treatments based on an individual’s genetic profile
Describe alignment, homology and similarity:
Alignment - 1 to 1 matching of 2 or more sequences, so each character in a pair of sequences is associated with a single character of their sequence or with a null character
Alignment enables the researcher to determine if two sequences display sufficient similarity to justify the inference of homology
Homology is a conclusion drawn from this data (similarity %) that the 2 genes share a common evolutionary history
Genes are either homologous or not homologous
There are no degrees of homology as there are in similarity
Describe the approaches for sequence alignment:
Pairwise alignment - comparing 2 sequences using global or local alignment
Multiple sequence alignment - aligning multiple sequences simultaneously
Reference based - Aligning sequences to a known reference genome
De Novo Assembly: Constructing sequences without a reference
Describe the difference between local and global sequence alignment:
Local alignment:
- stretches of sequences with the highest density of matches are aligned
- finds the highest scoring alignment regardless of position and length
- suitable for detecting conserved domains, motifs, or partial homologs
Global alignment:
- used to align the entire sequence using as many characters as possible
- finds the optimal alignment over the entire length of the sequences
- best for similar length sequences with high similarity
What are some tools used for sequence alignment ?
BLAST:
- Basic Local Alignment Search Tool
- tries to find the best alignment between your entire query and an entire database sequence
- Identifies homologous sequences
- characteristics: local alignments, rapid, heuristic
FASTA:
- tries to find regional similarity between the
entire query and the entire database sequence - Alternative to BLAST for sequence searching (faster)
Genome alignment:
BWA (Burrows-Wheeler Aligner): Fast alignment for large genomes.
Bowtie: High-speed short-read aligner
Describe multiple alignment and its usage:
Aligning three or more sequences to identify conserved regions, evolutionary relationships, and functional motifs
Usages:
- Identification of functionally important sites
- Demonstration of homology between sequences
- Molecular phylogeny
- Search for weak but significant similarities in
sequence databases - Structure prediction
- Function prediction
- Design of primers for PCR (polymerase chain
reaction) - Identification of related genes
Evaluate the main bioinformatics online resources containing data and information and tools for analysis
Describe translational bioinformatics methods and tools:
Describe Clustal W with relation to multiple alignment:
Clustal W is a widely used algorithm for Multiple Sequence Alignment (MSA) of nucleotide or protein sequence
It aligns multiple sequences by identifying conserved regions, evolutionary relationships, and functional domains
Mechanism:
- calculate a matrix of pairwise distances based on pairwise alignments between the sequences
- Use the result of A to build a guide tree, which is an inferred phylogeny for the sequences
- Use the tree from B to guide the progressive alignment of the sequences