Molecular diagnostics II Flashcards
What are the basic characteristics of NGS? (4)
- Millions of reads in parallel
- Short (<600bp) or long (>10kbp) technology
- Analysis of complex mixtures of DNA/RNA
- Enables genome wide approach
Which NGS method can be used for short reads?
Illumina
Which NGS methods can be used for long reads? (2)
- PacBio
- Nanopore
Describe the general workflow for NGS methods
Intake -> Isolate -> Library -> Sequence -> Report
What is enrichment and when does this take place in the NGS workflow?
Selection of part of the DNA you want to sequence (library step)
What is the definition of a ‘cluster’?
Single amplified molecule on a flow cell
What is the definition of a ‘read’?
Sequence read from a single cluster
What are the step of DNA/RNA preparation for NGS? (5)
- Fractionate/size select
- End repair/phosphorylate
- A-overhang
- Add adaptors -> adaptor ligation
- Denature and amplify -> product ready
NGS: Fractioning is used for which kind of sequencing?
Short-read sequencing
NGS: What does phosphorylation allow for during DNA/RNA preparation?
For the addition of adaptors
NGS: What is important to consider when you start from RNA instead of DNA?
Reverse-transcriptase step required to convert RNA into ssDNA
NGS: How can you select for non-ribosomal RNA?
Selection of fragments that have a poly-A-tail
NGS: What are oligo’s on a flow cell?
Anchoring fragments for adaptor molecules attached to the sample fragments
NGS: What does the flow cell do with DNA molecules? How?
Amplification -> every cycle, a base is added -> base added is detected
NGS: What is meant with ‘Bridge amplification’?
Generation of a cluster around the place where fragments attach to the flow cell
NGS: What does a patterned flow cell allow for?
Accumulation of clusters at a determined position
NGS: What are the advantages of a patterned flow cell? (3)
- Reduces overclustering
- Higher cluster density (data/mm2)
- No need to map clusters -> reduces runtime
NGS: What is exclusion amplification used by patterned flow cells?
As soon as a fragment ends up in a well –> amplification
NGS can/can’t perform multiple reads of a single DNA molecule
Can
NGS: What is the advantage of sequencing adaptors?
Allows for addition of barcode in adaptor -> distinguish sequences from multiple samples -> simultaneous sequencing
NGS: What does single read/single-end mean?
Sequencing a fragment from one side
NGS: What does paired-end sequencing mean?
Sequencing a fragment from both sides
NGS: What does a single index mean?
One index read (one sample-specific DNA code)
NGS: What does dual index mean?
Combined index reads
NGS: What does alignment of detected sequences back to a reference genome to determine its position allow for? (4)
Detection of:
- Heterozygous SNPs
- Insertion
- Deletion
- Homozygous SNPs
NGS: What is meant with ‘read depth’?
The amount of reads of the sequence that you are interested in
NGS: What is meant with ‘error rate’?
Total number mismatched bases after mapping/total number of aligned bases
NGS: What is meant with quality score?
Score for the sequence quality per base -> probability that the wrong base was called
NGS: Describe the proces of target enrichment
- Probes(containing magnetic beads)/adaptors are added
- Magnetic beads are removed
- Enriched sequences are sequenced
NGS: What is the advantage of enrichment?
Deeper sequencing of ROI
NGS: Name applications in which target enrichment is often used (4)
- Exome sequencing
- Gene panels
- Low abundant mutations
- mRNA
NGS: What are approaches to detect viruses using NGS? (3)
- Amplicons -> sequencing part of a sequence using PCR primers
- Target enrichment
- Shotgun metagenomics (sequence whole virus)
NGS: What encompasses the technical future of sequencing? (4)
- Amplification-free library prep
- Longer read length
- Single cell
- More combined assays
What is gene expression profiling?
Profiles RNA -> presence of genes AND expression levels
Gene expression profiling can be used using? (2)
- Microarrays
- RNAseq
What are the functions of microarray techniques in AML diagnostics/prognostics? (2)
- Recognize known subgroups
- Identify novel subgroups
What encompasses the start of any gene expression profiling?
RNA isolation
Why is the preparation of intact high-quality RNA essential during gene expression profiling?
Critical for obtaining reproducible results
Which technique is used to verify the RNA quality and integrity?
Automated capillary-electrophoresis
Why does the automated capillary-electrophoresis plot alway show two distinct peaks? This is indicative for?
18S and 28S ribosomal RNA –> quality of RNA
RNA quality depends on…? (2)
- Type: organ, tumor, blood, BM
- Harvest and storage
After quality control, mRNAs are labelled. How?
Targeting probes to poly-A-tail
mRNA labeling: Which promotor is used when adding a long strand of Ts hybridized to As?
Oligo(dT)-T7
What is a microarray?
Glass slide on which DNA oligo’s are printed
Microarray: What do DNA oligo’s printed on the glass slide represent?
All genes present in the genome
Microarray: What does the intensity of labels (labelled RNA) at certain positions on the array show?
Whether and how strongly a gene is expressed
Microarrays: what types of arrays are there? (2)
- One colour -> only one sample possible
- Two colors -> allows inclusion of reference sample
What common types of study objectives are used when analyzing micro-array data? (3)
- Class discovery
- Class comparison
- Class prediction
What is the study objective with class discovery?
Are there patterns of cases of disease X with similar expression?
What is the study objective with class comparison?
Comparing groups to one another -> which genes are specifically up-or down regulated in a particular group
What kind of learning is class discovery?
Unsupervised -> all data in one bin -> patients with similar expression profile cluster together
What kind of learning is class comparison?
Supervised -> class labels are added to the samples
What is the difference of knowledge taken into account with class discovery/comparison
- Class discovery –> no prior knowledge taken into account
- Class comparison –> knowledge about certain abnormalities is taken into account
Unsupervised learning can be performed using several tools, name 3
- Hierarchical clustering
- K-means
- Self-organizing maps
Clustering algorithms are based on…?
Different assumptions -> correlations between gene expressions
The performance of each clustering algorithm depends on..?
Properties of the input dataset
What are challenges of gene expression profiling in routine diagnostics? (2)
- RNA is highly sensitive to degradation
- Difficulties in standardizing protocols and techniques
Which NGS technique is able to directly sequence mRNA?
mRNA-seq
mRNA-seq allows for more data, name 5
- Expression levels
- Fusion transcripts
- Deletions
- Insertions
- Mutations
What does the number of reads represent in mRNA-seq?
Expression level of certain mRNA
What are the advantages of RNA-seq over microarrays? (3)
- Characterize novel transcripts, splicing variants, expression levels of known transcripts
- Higher resolution
- Can apply the same experimental protocol to various purposes
What are the various purposes of RNA-seq? (3)
- Detecting SNPs
- Mapping exon junctions (splice variants)
- Detecting gene fusions
If you want to detect SNPs using array technology, which technique do you use?
SNP array
SNP arrays are used to genotype many variants at the same time. What is the range?
Mostly between 750.000-1.2 million SNPs per array
Probes attached to one bead are/aren’t for the same SNP
Are
SNP array: Why is each bead spotted in multifold (multiple times on an array)?
To increase accuracy and redundancy
SNP array: Describe the procedure (2)
- DNA normalization and whole genome amplification
- Hybridization on array + single base extension
SNP array: how are the SNPs qualified?
Green OR red label: homozygous for one variant
Both labels equal: heterozygous
SNP array: Why can Illumina not detect CG- or AT SNPs?
They use a two-colour system –> C+G and A+T
Why can array technology also be used to look for copy number?
SNPs are spaced so densely and evenly over the chromosome
SNP array: Deletions of … bp can be detected?
~90.000
SNP array: Duplications of … bp can be detected?
~860.000
SNP array: Which genotypes are visible when using a range max of 90.000 bp?
Only A or B genotypes (No AB)
SNP array: What is visible when using a range max of 860.000 bp?
Extra band in B-allele frequency (ABB and AAB)
SNP array: what is meant with the log R ratio?
Normalized measure of signal intensity for each SNP marker
In which scenario’s do you apply array technology? (3)
- Rare alleles causing mendalian disease
- Low-frequency variants with intermediate effect
- Common variants implicated in common disease by GWA
Rare variants are being included in recent arrays, with a special focus on…(3)
- Known pathogenic variants
- Pharmacogenetic variants
- Rare, (potentially) pathogenic variants in genes of special interest
What is NOT covered by genotyping arrays? (2)
- Non-polymorphic sites of the genome
- De novo mutations
Why are de novo mutations not covered by genotyping arrays?
There are too many –> cannot all be covered
SNP array: rare variant are/aren’t more difficult to pick up than common variants
Are
Variants that are unfit to detect using array technology… (2)
- Rare variant with small effect sizes
- Examples of common variant influencing common disease are rare
Why is it advantageous to use arrays-only when having rare alleles with strong effect size?
Rare variant directly genotyped on array -> able to screen for all known variants of any given disease as long as they are on the array -> very high accuracy
SNP array: Which kind of studies are performed to detect low frequency variants with intermediate effect & common variants implicated in common disease?
Genome-wide association studies (GWAS)
SNP array: GWAS studies are/aren’t hypothesis free
Are -> only assumption = genetic component involved
Why is there a huge multiple testing burden in GWAS studies?
~1 million independent SNPs for common variants -> big issues with power -> large cohorts needed
Describe the steps of a GWAS analysis (3)
- Analyze all SNPs in 1 run
- Visualizing results in plot
- Select SNPs with strongest association per chromosome
If you found the SNPs with the strongest association per chromosome in a cohort, what is the next crucial step to perform?
Replication in a different cohort
What are the factors that determine power in GWAS? (3)
- Allele frequency of variant in population
- Effect size of variant
- Linkage disequilibrium of variant with true causal variant
What is meant with linkage disequilibrium?
Chance of inheriting over ‘en bloc’ because they are linked close together -> can seem like a variant is the causal variant, while it only inherits over together with the causal variant
Will the causal variant be more present than the LD SNPs around it?
Yes
What are the applications of GWAS in infectious disease? (with respect to host genetics)
- Infection rate
- Severity of symptoms
- Immune response
What are the most tested immunological parameters using GWAS?
HLA, receptors, etc.