L1-4: Overview of Systems Biology and Bioinformatics Flashcards
Every individual harbours ____ genetic variant sites.
4-5 million
Transcriptome
The full set of RNA molecules in one cell or a population of cells (i.e. expressed genes).
The aim of transcriptomic experiments is to identify ______.
differentially expressed genes
The Human Cell Atlas project aim
identify (based on transcriptome) and locate every cell in the human body
Proteomics can be distinguished into which four main aspects?
Sequence
Structural
Functional and interaction
Expression
Genomics
Study of the function and structure of a genome
Genome
The complete set of all genes, regulatory sequences and non-coding regions within an organism’s DNA
Contig
A set of overlapping DNA segments that together represent a consensus region of DNA.
Isolation of total genomic DNA - steps
- Mechanical disruption of cells/tissues (homogeniser, bead beater)
- Lysis of host cells (detergents such as SDS)
- Separation of DNA through enzymatic digestion of proteins, absorption to and release of the DNA from a chromatographic matrix (resin) and deproteinisation of the DNA solution with organic solvents (phenol / chloroform)
- Precipitation of DNA with ethanol or isopropanol
Construction of a shotgun library
- Collections of short segments of DNA generated by digestion of genomic DNA with restriction enzymes (representing the entire genome) are ligated into vector plasmids.
- Millions of different recombinant molecules are generated and these are propagated in bacteria or yeast.
Sanger Sequencing
When DNA binds dideoxynucleotides, they arrest DNA sequencing
The dideoxynucleotides for each of the four bases can each have a different fluorescent label so the 4 reactions can be run in the same tube.
The reaction is run on a polyacrylamide gel and fluorescence detected by an automated sequencing machine.
Next-generation WGS sequencing techniques
Illumina sequencing
Roche 454
Ion Torrent
Illumina Sequencing
100-150bp reads are used
Fragments are ligated to adapters and annealed to a slide. PCR is carried out and copies are separated into single strands.
The slide is flooded with nucleotides and DNA polymerase
An image is taken: in each read location there will be a fluorescent signal indicating the base that has been added
Terminators are removed, allowing the next base to be added. The process is repeated, adding one nucleotide at a time with imaging in between.
Roche 454 sequencing
- DNA is fragmented, adapters added, annealed to beads and amplified by PCR. Each bead is placed in a single well of a slide.
- The slide is flooded with one of the four nucleotides. The addition of each nucleotide releases a light signal.
- The NTP mix is washed away and the next NTP mix is added and the process is repeated, cycling through the four NTPs.
Ion Torrent: Proton/PGM Sequencing
- Ion Torrent does not make use of optical signals. The basis of ion torrent sequencing relies on the addition of a dNTP releasing a H+ ion.
- DNA is fragmented, adapters added and one molecule is placed on a bead and amplified by PCR. Each bead is placed in a single well of a slide. The slide is flooded with one of the four dNTPs.
- The pH is detected in each well. The release of a H+ ion will decrease the pH.
- The dNTPs are washed away and the process is repeated, cycling through the dNTPs.
Methods for long sequence reads
Nanopore technology
SMRT sequencing
Nanopore technology
- A protein nanopore is set in an electrically resistant polymer membrane.
- An ionic current is passed through the nanopore by setting a voltage across this membrane.
- If an analyte passes through the pore or near its aperture, this event creates a characteristic disruption in current
SMRT sequencing
Based on DNA replication:
A fluorescent label on the terminal phosphate of the dinucleotides can be detected when DNA polymerase incorporates the nucleotide into the DNA
The two most commonly used high-throughput methods of measuring the transcriptome are:
microarrays and RNA sequencing
Advantages of RNA-Seq over microarrays
Comprehensive; microarrays require known sequences and an annotated genome.
Microarrays only reveal information about ORFs.
RNA-seq covers entire genome
Detects novel transcripts
Identifies structural variations (gene fusions and alternative splicing)
Illumina, Roche454 and Iontorrent generate ____ bp reads
100
RNAseq workflow
- Library preparation: isolation of RNAs and generation of cDNA, selection of fragment size, and addition of linkers
- Illumina paired read sequencing
Experimental considerations of designing RNAseq experiments
Library construction:
choosing the population of RNA to use
Sequencing depth (required read number and coverage)
Number of technical and biological replicates
Majority of the RNA in a cell is ____
ribosomal RNA
RNAseq library construction workflow
- Choosing the population of RNA to use
- Generation of strand-specific library that retains the orientation of the original RNA transcript (allows identification of anti-sense transcripts or non-coding DNA
- Single or paired end reads (more confidence over intron splice sites)
Why is RNAseq sequencing depth important?
Adequate read depth is required to detect low-expression transcripts
Why is the number of technical and biological replicates important in RNAseq?
Increase in biological replicates significantly increases the number of differentially regulated genes expressed
Chromatography
The separation of components in a mixture that involves passing the mixture dissolved in a mobile phase through a stationary phase. The analyte is separated from other molecules based on differing partitioning
Mass spectrometry
Identification is based on the mass spectrum comparison against standard mass spectrum from other libraries.
GC/LC-MS generates a _______.
Extracted Ion Chromatogram (EIC)
Differences between GC and LS mass spectrometry
LC-MS detects a wider arrange of metabolites, as well as hydrophilic and hydrophobic metabolites. However it’s less reliable and stable than GC-MS.
GC-MS is more favourable due to:
- High peak capacity
- Excellent repeatability
- Vast and readily-available electron ionised compound libraries, making compound identification easier.
Illumina sequencing workflow
- Sample preparation; fragmentation of DNA and addition of adaptors
- Cluster growth
- Sequencing using fluorescence optics
- Imaging
Variability in an organoid expression can be due to
Changes in expression within cell types
Changes in the proportion of cell within the organoid
Differences between eukaryotes and prokyaryotes
- DNA is linear in eukaryotes, circular in prokaryotes.
- DNA is associated with histones in eukaryotes, naked in prokaryotes
- Prokaryotes do not have introns
- Prokaryotic DNA is located in the cytoplasm (no nucleus)
- Prokaryotes don’t have organelles
- Prokaryotes - 70S ribosomes; eukaryotes 80S
- Prokaryotes reproduce through binary fission (asexual)
- Prokaryotic DNA is haploid
Prokaryotic genome size
Usually less than 5Mb
Human genome size
3400 Mb
PacBio RS II vs. Illumina NGS
PacBio RS II has higher error rates but results in a complete/closed genomic map
Multiplex sequencing
Allows large numbers of libraries to be pooled and sequenced simultaneously during a single run of a high throughput instrument.
Individual barcode sequences are added to each DNA fragment that can be identified and sorted
FASTQ format output by line
LINE 1: Identifier
Line 2: Sequence
Line 3: +
Line 4: Quality score in ASCII encoded format
Examples of genome annotation pipelines
RAST, NCBI Prokaryotic Genome Annotation Profile; BASys
Protein purification techniques
Chromatography based techniques
Protein analysis tecnniques
ELISA, Western blotting, Protein microarray
Protein characterisation techniques
Mass spectrometry, gel-based approaches
Protein sequence analysis techniques
Edman sequencing
Protein quantification techniques
ICAT, SILAC, iTRAQ
Protein structural analysis techniques
NMR spectroscopy, x-ray crystallography
Primary metabolites
Distributed within all living organisms and are intimately essential life processes and include ubiquitous compounds
Secondary metabolites
Have only restricted distributions and are often a specific characteristic of individual organisms and species.
May not directly participate in growth and development but influence ecological interactions
Advantages of analysing the metabolome
- The metabolome is the downstream product of gene expression so it reflects the functional level of the cell more directly.
- Changes in the metabolome are generally amplified relative to the proteome and transcriptome.
- It is estimated that metabolomics experiments are lower in costs compared to other ‘omics’ as they produce more information per experiment.
How does NMR work?
Magnetic field is applied, molecules absorb and emit when placed in a strong magnetic field
Signal is proportional to the number of H in a molecule
NMR can be used in metabolomics to
Accurately quantify metabolites in a complex sample relative to a spiked internal standard.
It has low sensitivity therefore needs lots of sample
LC-MS is used for which metabolites
Higher molecular mass or lower thermostability metabolites
What is unique about C. hepaticus compared to other Campylobacter species?
Unlike other Campylobacter species, C. hepaticus has glucose and polyhydroxybutyrate metabolism pathways. These genes may play a role in stress response in C. hepaticus and are putative virulence factors
C. hepaticus HV10 is predicted to be able to
biosynthesise most amino acids, except
L-cystein and L-lysine
RNAseq comparative transcriptomic analysis beween in vivo colonisation and in vitro conditions steps
- RNA isolation
- RNAseq
- Read mapping to C. hepaticus HV10 complete genome
- Differential analysis of the two conditions
- Predict putative virulence genes in C. hepiaticus
Metabolomics in food microbiology
Identification and quantification of microbial metabolites in food e.g. Early detection of food pathogens and food spoilage microorganisms
Proteomics in food microbiology
Identification and quantification of microbial proteins within a food matrix e.g. Identification of bioactive peptides and proteins which are nutritionally important
Transcriptomics in food microbiology
discover the functions of food microorganisms e.g. Identify candidate genes involved in resistance by studying the differentially expressed genes under the antibiotic cultivation condition etc