Sequencing Methods Flashcards
what is PCR?
Polymerase Chain Reaction
a.k.a. a reaction to amplify DNA in a machine
why do we use PCR?
amplifying DNA is useful so you get more DNA material to work with. Can be very useful in a forensic context (since there is not much material to work with) to diagnose disease (infectiouse disease and mutiations) and just generally for the study of DNA.
Materials for PCR are…
1) DNA template (what you want to amplify)
2) DNA primers
3) DNA polymerase
4) DNTP’s (the building blocks)
5) Buffer solutions
6) Thermal cycling
DNA primers are…
short oligonucleotides (short sequence of nucleotides, usually fewer than 50 nucleotides) for the flanking of target region. they can bind to their complementary sequence.
DNA polymerase is…
an enzyme that is essential for the synthesizing of new DNA strands
Buffer solutions are used for…
for the optimal conditions of the enzymatic reactions
Thermal cycling entails…
temperature cycles to create the necessary conditions for the reactions
Method for PCR is:
1) Denature (95C) : DNA seperates into 2 strands because the hydrogen bonds breaks due to the heat
2) Annealing (50-65C): DNA primers can bind to their complementary strand
3) Extension (72C): the temperature is raised for the optimal function of the DNA polymerase
Repeat: usually 25-35 repeats
Note: temperatures can differ per primer or enzyme used
Advanteges of PCR
+ high sensitivity: can amplify really small portions of DNA
+ High specificity: primers can make the amplification really specific
+ Usually quick
Limitations of PCR
- Easy contaminated due to it’s sensitivity
- poorly designed primers can have huge effects
- It just amplifies, it does not sequence
What is Sanger Sequencing
it is a chain termination method to sequence DNA
Why do we use Sanger Sequencing?
It is used to sequence DNA. It is still used to validate the results of next generation sequencing technologies and can still be cost-effective for smaller projects
Materials needed for Sanger Sequencing are:
1) DNA template
2) Primers
3) DNA polymerase
4) dNTPs to syntesyse DNA
5) ddNTPs with fluorescent dye to stop synthesis and find out what the sequence is
6) buffer for optimal conditions of reaction
Sanger Sequencing Method:
1) The DNA that needs to be sequenced is denatured into single strands
2) Primers will anneal to the DNA fragments
3) DNA syntesis –> shorter and longer fragments will be formed because of the dNTPs which wull synthesise the strand and ddNTPs (that have the dye) will terminate the synthesis of the strand, leading to multiple strands with varying lengths
4) electrophoresis –> because longer strands are stuck at the top of the gel while short strands will end up at the bottom. The ddNTPs with the dye will indicate the full sequence if you look from the bottom to the top.
5) Data analysis –> using a chromatogram you can analyse the peaks associated with the corresponding nucleotides
Advantages to Sanger Sequencing
+ high accuracy
+ Usefull and small scale projets
+ validated and reliable
Limitations of Sanger Sequencing
- Lower throughput
- Not as useful for de novo (NGS has more depth coverage)
- Difficulties in high repeat area
Next Generation Sequencing
Next-generation sequencing (NGS) is a high-throughput technology that allows for the rapid sequencing of entire genomes or targeted DNA regions, enabling comprehensive genetic analysis and advancements in personalized medicine, research, and diagnostics.
(think of Illumina or Oxford Nanopore)
advantages of NGS vs. Sanger Sequencing
+ higher throughput (can sequence millions of DNA fragments simultaneously
+ More rcomprehensive coverage, such as entire genome or large targeted areas.
+ Cost effective per base and it is easier to scale up
+ Very fast dor how much data it can provide
+ high coverage depth, better at finding rare variants
+ Versatillity: WGS, WES, RNAseq, Epigenetic, Metagenomics
+ better at detecting single nucleotide variations
+ can be used for de novo sequencing
Limitation NGS
It may not always be as accurate as Sanger Sequencing
WGS (what is it)?
Whole Genome Sequencing: a method to Determine the complete DNA sequence. (incorporating coding and non-coding , extra chromosomal areas and mitochondrial DNA)
Why do we use WGS?
WGS has comprehensive coverage and thus potentially gices a lot of information
Method WGS:
1) DNA isolation: blood, tissue and cultured cells
2) Library preparation: adaptors are ligated to ends of DNA fragments
3) Sequencing
4) Data analysis
5) Interpretation
Advantages for WGS
+ Comprehensive coverage, entire genome, regulatory elements, repetitive regions
+ Rather accurate
+ Useful for novel genetic variants not captured by targeted sequence approaches
Limitations WGS
- Need a lot of DNA and analysis resources
- High cost
- Difficult to interpret, in WGS we see a lot of variations, but there are a lot of VUS (variants of unknown significance) and the effects of non-coding regions are not always well understood.
WES (what is it?)
Whole exome Sequencing: focusses on the protein-coding regions of the genome (exons) which is about 1-2% of the human genome
Why use WES?
This type of sequencing looks at parts of the genome which tend to be more clearly disease causing and better understood in the context of disease compard to non-coding elements.
How does WES work?
1) targeted enrichment: regions of interest are selectively captured using oligonucleotides (in this case sequences to flank targeted regions)
2) Library preparation
3) Sequencing
4) Data analysis
Advantages WES
+ more efficient because the sequenced regions are more likely to be disease causing
+ Is especially useful for detecting disease causing variants
+ Can also facilitate gene discovery
Limitations WES
- Incomplete coverage compared to WGS
- Limited detection of structural variants such as chromosomal rearrangements or inversions
- There are still many variants of unknown significance or variants that are significant but in non-coding areas
RNA Sequencing (what is it?)
Instead of sequencing the DNA we sequence the RNA
Why use RNA sequencing?
1) gives insights on the actual gene expression, so what is actually transcribed
2) insights in alternative splicing effects
3) it can detect fusion genes
4) It gives insights in the functional consequences
5) it allows us to detect imprinting effects by seeing which genes ar active, and which are not
Limitation RNA sequencing
RNA is not as stable as DNA
sc-RNA Sequencing (What is it?)
Single Cell RNA Sequencing: RNA sequencing of a single cell
Why use sc-RNA Sequencing
To find out what is being transcribed
How does sc-RNA Sequencing work?
1) Cell isolation
2) Library preparation –> extraction of RNA and the cDNA (coding DNA) because it tends to be more stable
3) PCR
4) Sequencing
5) Data analysis
Advantages sc-RNA sequencing
+ Insights how cell types differ in expression
+ It allows for detection of rare cell types within cell populations
+ Allows for the study of dynamic changes such as the response of environmental stimuli on a cell
Limitations sc-RNA Sequencing
- Limited coverage
- More difficult to replicate
- More noise and amplification bias compared to bulp RNA sequencing
- Dissociating cells from tissue can lead to artifacts and stress response from cells
- More difficult to maintain cell integrity and viability
Bulk-RNA Sequencing (what is it)
B-RNA sequencing: takes multiple cell types in its analysis
Why use Bulk-RNA sequencing?
Because it has some good advantages
(coste effective, population level analysis, easy to replicate, global expression of genes)
How does bulk-RNA sequencing work?
1) RNA extraction through cells and tissue
2) Library preparation
3) Sequencing
4) Data analysis
Advantages bulk-RNA sequencing
+ cost effective
+ population level analysis
+ Easier to replicate: because its a larger sample to average out compared to one cell sample
+ Global expression of genes: including upregulation and downreagulation of genes
Limitation of bulk-RNA sequencing
- Cannot distinguish cell to cell variability, such as subpopulation
- Limited detection of rare cell transcriptions
- It is more of a snapshot and less able to detect dynamics
- Limited resolution for cell-to-cell interactions
Targeted Gene Panels
focusses on specific mutations in a given sample.
Limitation Targeted gene panels
- Limited information and other mutations might actually be causative
Advantage Targeted gene panel
+ It is great if you have an idea of what gene is involved and you want to confirm it.
SNP array (what is it?)
Single Nucleotide Polymorphism array is an array to detect SNP in the DNA
SNP
occur when a single nucleotide (building block of DNA) is replaced with another
Why use SNP array’s?
Some SNP are associated with disease
Advantages SNP array
+ High throughput
+ Cost effective
+ Highly reproducible
+ Well-established technology
Limitations SNP-array
- Limited variant detection
- Often allele specific
- Cannot be easily modified
- No structural variants
Short Read WGS (what is it?)
Sequencing using short fragments (typically 50-300 bp). Technologies such as illumina and pyrosequencing use this method .
Why use Short Read WGS?
has nice advantages –> 1) cost effective
2) high accuracy
3) robust protocols
Limitations Short Read WGS
- Limited for large structural variations and complex genomic rearrangements
- Limited information compared to long read
Illumina (Short read) what is it?
Sequencing by synthesis technology
Why use Illumina/Short read?
for Sequencing
How does Illumina/Short read work?
1) Library preparation
2) cluster generation through bridge amplification
3) Sequencing: nucleotides have fluorescence which is captured by camera
4) data analysis
Advantages Illumina / short read
+ High throughput
+ High accuracy
+ Flexibility: can be used for WGS, WES and RNA-sequencing
Limitations Illumina/ short read
- GC bias (dependence between fragment count (read coverage) and GC (guanine and cytosine) content found in Illumina sequencing data.)
- Read length is not as long as other metods (50-300 bases per fragment)
- General short read sequencing limitations
Pyrosequencing
Is a short-read sequencing method
Long read WGS (what is it?)
Sequencing of long fragments which can be multiple kilobases (1kb = 1000 bases)
Why use Long Read WGS?
advantages:
+ detection of larger structural variations
+ Better for complex regions
Limitations Long read WGS
- Higher error rate
- Lower throughput
- Higher quality and quantity of DNA is necessary
PacBio (what is it?)
a single molecule, real time Long-read sequencing method
How does PacBio work?
1) template preparation of circular DNA molecules
2) Put these molecules in SMRT cells which contains wells that capture fluorescence signals
3) Real time sequencing
4) Data analysis
Why use PacBio?
Allows for sequencing of DNA and RNA molecules in real time
Advantages PacBio
+ Long-reads in the kilobases
+ High accuracy because of the use of circular DNA molecules (it is repeatedly sequenced instead of in clusters)
+ Direct detection
+ Can be used for de novo assembly
Limitations PacBio
- Slightly higher error rate comared to short-read sequencing
- High cost
- Need for high quality DNA
Oxford Nanopore (what is it)
Real time long read sequencing method through the measurement of electrical currents of the DNA and RNA molecules
Why use Oxford Nanopore?
It’s portable and can do direct RNA sequencing
How does Oxford Nanopore work?
1) Library preparation
2) Loading into flow cells
3) Translocation through nanopores
4) Signal detection and base calling, because the electrical currents are different for the basepairs
Advantages Oxford Nanopore
+ Long-reads
+ Real time sequencing
+ portability
+ Direct sequencing (which can be used for methylation detection)
Limitations Oxford Nanopore
- Relatively high error rate
- Throughput is relatively low
- Flow-cell life span is not that high
Karyotyping (what is it?)
A cytogenetic technique to visualize the chromosomes under a microscope
why use Karyotyping?
it can be used to assess the number, size, shape, binding patterns or other clear chromosomal abnormalities
How does karyotyping work?
1) sample collection: tissue biopsy of actively dividing cells
2) Cell culture and mitotic arrest, this is done by simulation of cell division and arresting the cells with colchicine in the metaphase
3) Chromosome harvesting: add solution to swell cells and spread chromosomes
4) staining and banding: add dyes so banding patterns become visible
5) microscopic analysis
Advantages Karyotyping
+ detection of chromosomal abnormalities
+ helpful for diagnosis for example trisomy
Limitations Karyotyping
- difficult or impossible for small abnormalities
- requires living cells
- time consuming
- labor intensive
- interpretation is difficult and requires knowledge in cytogenetics
- Limited information, does not say something on the molecular level
FISH (what is it)
Fluorescence in situ hybridization = a molecular cytogenetic technique
why use FISH?
Meant to visualize and map the location of specific DNA sequences in chromosomes, cells or tissue samles. If you suspect an abnormality, you can design a probe to confirm the abnormality
How does FISH work?
1) Probe design: you need a specific target sequence in the genome
2) sample preparation: immobilize cells and chromosomes
3) Denaturation and hybridization: Probe binds to complementary DNA strand (hybridization)
4) wash away the excess or unbound probes
5) Image analysis
Advantages FISH
+ Visual detection of DNA sequences
+ high specificity (ability to correctly identify lack of abnormalities in sample) and sensitivity (ability to correctly identify abnormalities in sample)
+ Multiplexing is possible
Limitations FISH
- Limited resolution: It is based on visual detection and thus on the quality of the microscope and researcher. There can be subjectivity in interpretation.
- Probe design is difficult and time consuming
CHIP-seq (what is it?)
Chromatin immunoprecipitation followed by sequencing:
Is a sequence method meant to investigate protein-DNA interactions and chromatin (histones and proteins that regulate DNA)
Why do we use CHIP-Seq?
To study transcription factors, histone modification and chromatin accessibility (histones accessibility can affect DNA transcription)
How does CHIP-seq work?
1) cross linking: link proteins to DNA
2) Chromatin fragmentation: fragmentation of 200-500 base pairs
3) Immunoprecipitation: antibodies will bind to the transcriptional factors and histone modification. The complexes that form through this are captured
4) Purify DNA by washing away other proteins and antibodies
5) sequencing using NGS
6) Data analysis
Advantages CHIP-seq
+ genome wide coverage
+ high sensitivity and specificity
+ gives quantitative data, meaning that it gives information on the strength of the protein-DNA relations
+ Multiplexing is possible
Limitations CHIP-Seq
- Dependent on antibodies
- Cross linking efficiency can be variable
- Require high complexity bioinformatic tools
Cut & Run (what is it?)
Cleavage Under Targets & Release Using Nuclease:
Is supposed to be a refinement of CHIP-seq and overcome its limitations
Why use Cut&Run?
Because it is better than CHIP-Seq
How does Cut & Run work?
very similar to CHIP-seq but with a more specific cleavage mechanism
Advantages Cut & Run
+ Reduced background noise
+ Easier to use
+ Increased sensitivity
+ Improved resolution (more precise localization)
Limitations Cut & Run
- still dependend on antibodies
- Not suitable for all tissue types
ATAC-Seq (What is it?)
Transposase-accessible chromatin using sequencing:
A method to study gene regulation
[ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a technique used in molecular biology to assess genome-wide chromatin accessibility]
why use ATAC-Seq
Assessing chromatin accessibility on genome-wide scale, to study gene regulation. This is done by measuring the accessibility of DNA to Tn5 transposase enzyme which inserts itself into chromatin regions
How does ATAC-seq work?
1) Tn5 binds to accessible DNA
2) Accessible DNA is fragmented by Tn5
3) Sequencing adaptors are inserted into these regions
4) DNA fragments are amplified
5) Next generation sequencing (NGS)
Advantages ATAC-seq
+ high sensitivity: can detect subtle changes
+ does not require a lot of input material
+ relatively simple and fast
+ does not require antibodies
Limitations ATAC-Seq
- Bias in some types of regions
- High computational demands
- Less sensitive in regions near nuclear membrane
Nascent RNA-seq (what is it?)
a technique to study transcribed RNA, specifcally Nascent RNA. Which is RNA that has not undergone modification yet
Why use Nascent RNA-seq?
Unlike normal RNA sequencing it can only sequence RNA that have recently been made. It can be used to sequence the RNA that comes from regulatory elements
How does Nascent RNA-seq work?
1) isolation of Nascent RNA
2) RNA extraction and library preperation
3) Sequencing
4) Data analysis
Advantages Nascent RNA-seq
+ selective analysis and gives snapshot of transcriptional activities and dynamics
+ regulatory elements do produce nascent RNA so they can be measured using this method
+ High sensitivity
Limitations Nascent RNA-seq
- Cell state and stability can effect quality
- Complex
Chromosome Conformation Capture = 3C (what is it?)
Molecular biology technique for regulatory elements
Why use 3C
Investigate the 3D structure of the chromosome, it can be used to study the folding and loooping of the chromosome. 1vs.1 so you look at the interaction between 2 genomic loci
How does 3C work?
1) cross linking
2) Restriction digestion
3) Ligation
4) Cross link reversal and purification
5) detection and quantification
Advantages 3C
+ detection of long-range interactions
+ genome-wide analysis
+ regulatory elements can be studied
Limitations 3C
- complexity
- requirements of antibodies
4C, 5C and HIC (what is it?)
4C: one vs. all: look at the interactions between 1 genomic loci and all other genomic loci
5C: all vs all, chromatin interaction of multiple loci at the same time
Hi-C: Genome wide captures all possoble interactions
GWAS (what is it?)
Genome wide association studies:
to identify genomic variants that are statistically associated with a risk for disease or particular trait. This often includes SNP’s, insertions and deletions (INDELs) and copy number variants (CNV) which are larger structural variations.
Why use GWAS?
to statistically associate genomic variations with diseases or traits
How does GWAS work?
1) start with a population
2) Genotype the population (SNP array or WGS)
3) Meta analysis
4) Statistical association
Advantages GWAS
+ can be used in the discovery of genetic variants
+ Hypothesis free approach
+ population level insight
+ Biological insight
+ Can be used for polygenic risk scores
Limitations GWAS
- Often have limited predictive power, especially for individuals within the population
- Limited in explaining complex traits and diseases
- Limited causality
- Difficult for rare variants (statistical power)
- Population stratification (some subgroups in populations might have a significant value while the full population does not or other way around. Can lead to false positices or false negatives).
PRS (what is it?)
Polygenic Risk Scores:
risk scores obtained from the GWAS studies. Such as the chance of a trait or odds ratio of a disease
Why do we use PRS?
can be used for predicting disease, prevention strategies and personalized medicine
Advantages PRS
+ Improved risk prediction
+ Can lead to early detection and prevention
+ Informative for common diseases
Limitations PRS
- can only explain small proportion of variance in complex diseases
- can still be quite specific and non-generalizable for population
- utility might be low for some diseases
- ethics: pricacy, informed consent, health insurance
- Mostly based on white European samples
- Can be difficult to communicate that it is about risk and this may have negative consequences for mood, behavior and stress.