18.02.15 NGS applications Flashcards

1
Q

What is WGS?

A

Whole genome sequencing = sequencing of the entire genome - mtDNA + nuclear DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is WES?

A

Whole exome sequencing covers coding sequences of all annotated protein-coding genes (~23,000). Equivalent to 1-2% of the total haploid genomic sequence (~30Mb).
Contains 85% of all DNA mutations that have an effect on human disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the advantages of WGS?

A
  1. Allows examination of SNVs, indels, SV and CNVs in coding and non-coding regions of the genome. WES omits regulatory regions such as promoters and enhancers.
  2. WGS has more reliable sequence coverage. Differences in the hybridisation efficiency of WES capture probes can result in regions of the genome with little or no coverage.
  3. Coverage uniformity with WGS is better than WES
  4. Regions of the genome with low sequence complexity restrict the ability to design useful WES capture baits resulting in off-target capture effects
  5. PCR amplification isn’t required during library prep, reducing potential of GC biase. WES frequently requires PCR amplification as the bult input amount needed to capture is generally around 1ug of DNA
  6. Sequencing read length isn’t a limitation. Most target probes for exome-seq are designed to be less than 120nt long, making it meaningless to sequence using a greater read depth
  7. A lower average read depth is required to achieve the same breadth of coverage as WES
  8. WGS does not suffer from reference bias. WES capture probes/baits tend to preferentially enrich reference alleles at heterozygous site producing false SNV calls.
  9. In WES the RefSeq collection is targeted, hence current capture probes only target exons that have been identified so far. The exome will change as our understanding of the human genome improves.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the advantages of WES?

A
  1. WES is targeted to protein-coding regions, so reads represent less than 2% of the genome. This reduces the cost to sequence a targeted region at a high depth and reduces storage and analysis costs (although the cost of WGS is likely to decrease more rapidly than WES).
  2. It reduces the data fatigue associated wtih WGS in interpreting the large amount of data produced, when only a relatively small subset of the variation that WGS detects have demonstrated health consequences.
  3. Reduced costs make it feasible t increase the number of samples to be sequences, enabling large population based comparison.
  4. Until very recently the time, cost and technical expertise required to generate and analyse WGS data largely precluded serious consideration of its use outside of research settings, but the situation is changing rapidly and this can no longer be assumed to be the case.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which types of diseases are more suited to WES/WGS technology?

A

WGS: suitable for mendelian and complex trait geen identification as well as sporadic phenotypes caused by de novo SNVs or CNVS

WES: GOod for hihgly penetrant mendelian disease gene identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What types of variants are detected in WGS vs WES?

A

WGS: Uncovers all genetic and genomic variation (SNVs and CNVs). Discovery of functional coding and noncoding variation. ~3.5 million variants.

WES: Focuses on ~1% of the genome. LImited to coding and splice-site variants in annotated genes. ~20,000 variants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How were disease traditionally mapped?

A

~3000 loci identified through positional mapping e.g. karyotyping, linkage,, CNV analysis followed by Sanger sequencing.

Difficulty identifying rare disease loci due to small number of cases/families to study, reduced penetrance, locus heterogeneity and diminished reproductive fitness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What other methods have been used for gene discovery? Briefly summarise.

A

GWAS: have contributed to the discovery of loci involved in complex traits but in almost all cases, they collectively account for only a small fraction of the observed heritability of the trait. Little is known about the extent to which rare alleles contribute to the heritability of complex traits.

NGS: one-step approach but present a challenge invariant interpretation. The number of variants identified in WGS/WES varies depending on exome enrichment kit, sequencing platform, algorithms used for mapping and variant calling, quality filters and % of ready showing the variant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How should variants be prioritised for suspected pathogenicity?

A
  1. Unique in patients/very rare in general population
  2. Located in protein-coding gene
  3. Directly affecting the function of the protein encoded by the mutated gene

Strategies for finding causal alleles against this background depend on factors such as the mode of inheritance of a trait; the pedigree or population structure, whether a phenotype arise owing to a de novo or inherited variant, extend of locus heterogeneity for a trait.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which disorders does exome sequencing have the potential to aid in the accurate diagnosis of?

A

Mendelian disorder which

1) Present with atypical manifestations
2. Are difficult to confirm using clinical or laboratory criteria alone e.g. when symptoms are shared among multiple disorders
3. Require extensive or costly evaluation e.g. a long list of candidate genes, such as for NSHL and or CMT/IPN, ID syndromes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ethical considerations with WES?

A

Increased the chance of incidental findings; clinically useful results unrelated to the primary aim of investigation e.g. susceptibility to cancer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give examples of how WES strategies can be used to aid diagnosis in patients with rare disease.

A
  1. Sequence and filter across multiple, unrelated, affected individuals. This approach is used to identify novel variants in the same gene(s).
  2. Sequence and filtering among multiple affected individuals from within a pedigree to identify a gene(s) with a novel variant in a shared region of the genome.
  3. Sequencing parent-child trios to identify de novo mutations.
  4. Sampling and comparing the extremes of the distribution for quantiative phenotype.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages of targeted NGS vs WGS/WES?

A

1) Slightly cheaper
2) Coverage better as can fill gaps with Sanger
3) Can interpret and report fully the importance of each variant identified, whereas variants have to be filitered in WES/WGS as there are so many
4) Less chance of incidental findgins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the general applications of targeted NGS?

A

Hereditary disorder
Therapeutic decision making for some somatic cancer
Infectious disease testing
Tumour profiling
MRD
Amplicon sequencing
Prenatal testing and screening - NIPT/PGD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages and disadvantages of using targeted NGS

A

Only genes with a known disease relationship are included.
Inflexibility of target panels mean redesign is required to incorporate novel targets.

The reduction in the cost of exome capture and sequencing reagents makes a virtual exome approach more viable; sequencing an exome and masking all but the desired data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of a virtual exome?

A
  1. Reducing the identification of incidental findings
  2. Flexibility in analysis
  3. Addition of new genes to analyse at no extra cost.
17
Q

What is the major issue with virtual exome testing?

A

Coverage to a diagnostically suitable level may bot be achieved; depth of data is sacrificed for breadth of genes testing in this approach.

Targeted NGS may evolve in future to disease associated exome test of 2-3000 genes. This would involve analysing a primary set of genes in the first instance and extending analysis to a broader set of genes if negative.

18
Q

Summarise the potential uses of NGS in the prenatal context.

A

Combining genome sequencing of both parents, genome-wide maternal haplotyping and deep sequencing of maternal plasma DNA has allowed the genome sequence of an 18.5/40 human foetus to be determined.

Single gene disorders (NIPD): Some publications describe use of NGS to sequence the entire cell-free DNA in maternal plasma to provide a prenatal diagnosis of almost any inherited genetic condition (e.g. Fan et al 2012 Nature 487, 320-4).

In 1st and 2nd trimester approach could be used to test for conditions that are not survivable or lead to medical complications. As technologies for pharmaceutical and surgical intervention

improve it may be possible to develop prenatal treatment or even cures for such conditions. Knowledge of fetal genotypes in 3rd trimester enable diagnosis of conditions that would benefit from treatment immediately after delivery e.g. phenylketonuria, galactosaemia.

Aneuploidies (NIPT): NGS for the detection of aneuploidies in cffDNA shows high levels of sensitivity and specificity necessary for clinical application. To detect trisomies, DNA isolated from maternal serum is sequenced; reads are mapped to the human genome and counted per chromosome. When 5-10 million reads are mapped, trisomies give a significantly higher number of reads mapping to a particular chromosome, relative to others.

This approach eliminates the miscarriage risk associated with invasive testing. Subsequently the number of women facing a decision regarding termination and the total number of terminations for fetal aneuploidy may increase.

19
Q

Briefly describe the use of NGS in PGD.

A

e.g. using the VeriSeq kit from Illumina to determine embryonic ploidy of all chromosomes. Results generated by this approach are comparable to those achieved with the widely used array-based 24sure technology, and requires less DNA than array-based methods.

20
Q

Briefly describe the PAGE study

A

Aims: to discover novel causes of foetal abnormality and to develop methods for speedy feedback of clinically important information to clinicians/patients during pregnancy.

Participants: 2000 women undergoing invasive testing for increased nuchal (>4mm) or one of more structural abnormalities notes on ultrasound scan after 11/40. Includes a combination of genetic testing, interviewing the parents to explore their opinions on genetic testing and performing a health economics evaluation to determine at the cost of testing.

21
Q

What is ChIP-seq (Chromatin immunoprecipitation followed by sequencing)?

A

A technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes; used for studying transcriptional regulation and epigenetic mechanisms.

22
Q

Why is ChIP-seq technology useful?

A

Nucleosome positioning and modification of DNA and histones is important in gene regulation and guides development and differentiation.

Chromatin states can influence transcription directly by altering DNA packaging to allow or prevent access to DNA-binding proteins, or they can modify the nucleosome surface to enhance or impede recruitment of effector protein complexes.

The main tool for investigating these mechanisms is chromatin immunoprecipitation (ChIP), which is a technique for assaying protein–DNA binding in vivo. In ChIP, antibodies are used to select specific proteins or nucleosomes, which enriches for DNA fragments that are bound to these proteins or nucleosomes.

23
Q

How are the DNA fragments in ChIP-seq sequenced?

A

In ChIP–seq, the DNA fragments of interest are sequenced directly instead of being hybridized on an array (ChIP–chip). The more precise mapping of protein-binding sites provided by ChIP–seq allows for a more accurate list of targets for transcription factors and enhancers, in addition to better identification of sequence motifs.

24
Q

How is analysis of ChIP-seq sequence performed?

A

For protein–DNA binding, the most common follow-up analysis is discovery of binding sequence motifs. ChIP–seq data is used to annotate the location of the peaks on the genome in relation to known genomic features, such as the transcriptional start site, exon–intron boundaries and the 3’ ends of genes. The transcriptional start sites of active genes, for instance, are known to be enriched with histone H3 trimethylated at lysine 4 (H3K4me3), and enhancers are enriched with histone H3 monomethylated at lysine 4 (H3K4me1).

Many studies have been performed using ChIP-seq and DNaseI hypersensitive site (DHS) mapping of which those of the ENCODE and FANTOM5 projects are most impressive, revealing genome wide profiles and binding sites for a range of DNA binding proteins.

25
Q

What is a transcriptome and what are the key aims of transcriptomics?

A

The transcriptome is the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition. The key aims of transcriptomics are to catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs, to determine the transcriptional structure of genes i.e. their start sites, 5’ and 3’ ends, splicing patterns and other post-transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions.

26
Q

What types of methods have been developed to deduce and quantify the transcriptome? What is the limitation of these?

A

Hybridization-based approaches typically involve incubating fluorescently labelled cDNA with custom-made microarrays or commercial high-density oligo microarrays.

Sequence-based approaches include: Sanger sequencing of cDNA or EST libraries, serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE) and (MPSS).

Their biggest limitation is that they are based on expensive Sanger sequencing technology, and a significant portion of the short tags cannot be uniquely mapped to the reference genome.

27
Q

What is RNA-seq technology?

A

RNA-Seq uses recently developed deep-sequencing technologies. In general, a population of RNA (total or fractionated, such as poly (A) +) is converted to a library of cDNA fragments with adaptors attached to one or both ends. Each molecule, with or without amplification, is then sequenced by single-end or pair-end sequencing. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genome-scale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.

28
Q

What are the advantages of RNA-seq?

A

RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence.

It can reveal the precise location of transcription boundaries to a single-base resolution.

30-bp short reads from RNA-Seq give information about how two exons are connected, whereas longer reads or pair-end short reads should reveal connectivity between multiple exons.

Reveal sequence variations e.g. SNPs in transcribed regions.

Highly accurate for quantifying expression levels as determined using quantitative PCR (qPCR) and spike-in RNA controls of known concentration.

There are no cloning steps, and with the Helicos technology there is no amplification step, therefore RNA-Seq requires less RNA sample.

The single-base resolution of RNA-Seq has the potential to revise many aspects of the existing gene annotation, including gene boundaries and introns for known genes as well as the identification of novel transcribed regions. As RNA-Seq is quantitative, it can be used to determine RNA expression levels more accurately than microarrays.

29
Q

What is the application of RNA-seq?

A

RNA-Seq, with its high resolution and sensitivity has revealed many novel transcribed regions and splicing isoforms of known genes. It has also mapped the 5’ and 3’ boundaries for many genes.

30
Q

What are the challenges of RNA-seq?

A

Library construction. Unlike small RNAs (e.g.miRNAs, Piwi-interacting RNAs (piRNAs), short interfering RNAs (siRNAs) and others), which can be directly sequenced after adaptor ligation, larger RNA molecules must be fragmented into smaller pieces (200–500 bp) to be compatible with most deep-sequencing technologies. The common fragmentation methods may introduce different biases in the outcome.

Another key consideration concerning library construction is whether or not to prepare strand-specific libraries, which have the advantage of yielding information about the orientation of transcripts valuable for transcriptome annotation, especially for regions with overlapping transcription from opposite directions.

31
Q

What is the principle of third generation sequencing methodologies?

A

Single-Molecule Sequencing in Real Time (SMRT) from Pacific Bioscience utilises a sequencing-by-synthesis technology based on real-time imaging of fluorescently tagged nucleotides as they are incorporated into nascent DNA molecules from individual DNA templates.

32
Q

What are the advantages of third generation sequencing methodologies?

A

It avoids the PCR-based artefacts like uneven amplification and GC-bias. Monitoring systems can also be tuned to detect epigenetic changes. The average read length can be 30-200 times longer than the read length from an NGS instrument, making it easier to assemble genomes. The long read lengths also have more power to reveal complex SVs present in DNA samples, such as precise localisation of copy number variations relative to the reference sequence.

SMRT sequencing is also capable of identifying RNA base modifications in the same manner as it detects DNA base modifications, using an RNA transcriptase in place of DNA polymerase.

SMRT sequencing has mainly been used on small genomes but when it is possible for human genomes it is predicted that it will be possible to sequence an entire human genome in ~1 hour.