NGS Flashcards
what is pyrosequencing?
“sequencing by synthesis” principle in which polymerase extended the DNA one dNTP at a time. When dNTP is added to an open 3’ DNA strand pyrophosphate is released. A cocktail of enzymes is used in pyrosequencing which couples this pyrophosphate to light emission by luciferase. amount of light proportional to number of base incorporated eg. Mitochondrial point mutation analysis in MELAS, MERRF, NARP and Leber’s Herediatary Optic Neuropathy (LHON)
advantages of pyrosequencing?
Quick and cheap Detects low level – quantifiable down to ~5% variant level Detect heteroplasmy Can be used to detect methylation status
disadvantages of pyrosequencing?
Short length of sequence is sequenced Data can be complex to genotype depending on type of variant analysis required SNPs can affect primer binding sites
what is NGS library preparation?
fragmenting starting material and ligating adapters and indices to allow sequencing.
what is NGS enrichment?
Enrichment is needed to capture regions of interest for single genes, panels and exomes but not WGS. It may be amplicon (PCR) based or hybridisation based
Describe amplicon (PCR-based_ enrichment) for NGS?
eg. Nextera XT illumina: transposomes randomly cleave ds-DNA and ligate adaptor oligos with different sequences to 5’ end. A limited PCR cycle then adds indexes and full adapter sequences to the fragmented DNA for sequencing. eg. Qiagen - reduces bias by integrating unique molecular indices (UMI). genomic DNA is fragmented and ligated with UMI and adapter. Target enrichment performed by targeted PCR with gene specific primer and universal primer to the adapter. universal PCR amplifies the library. after sequencing, reads with same UMI are pcr duplicates and are removed to identify artefacts and CNV
describe hybridisation-based enrichment for NGS?
eg. Agilent SureSelect -fragment DNA, tag with adaptors and barcodes and capture libraries with RNA or DNA-based oligos. oligos anneal to specific regions of genome. hybridise sample with biotinylated RNA library baits and select target region by magnetic streptavidin beads. amplify and sequence eg. agilent Haloplex - restriction digest, anneal ds-biotinylated oligos and capture with streptavidin coated magnetic beads. PCR with common primers generates library of enriched fragments
what are the advantages of amplicon (PCR) based enrichment? for NGS
cheaplow quantity neededfasteruseful for smaller regionssuitable for FFPE
what are the disadvantages of amplicon (PCR) based enrichment? for NGS
NAME?
what are the advantages of hybridisation based PCR enrichment? for NGS
NAME?
what are the disadvantagesof hybridisation based PCR enrichment? for NGS
high quantity required- higher cost- longer prep time- difficult to distinguish pseudogenes
what is the main difference between second and 3rd generation NGS platforms?
2nd generation platforms utilize amplification step prior to sequencing library molecules unlike single-molecule sequencing performed by 3rd generation platforms
describe the general 2nd generation NGS process?
sequencing platform uses a series of automatically coordinated, repeating chemical reactions typically carried out in a flow cell or compartment which houses the immobilized templates and necessary reagents. Most platforms (with the exception of SOLiD) use ‘sequencing by synthesis’ - a repeated cyclical process which occurs within the flow cell and consists of nucleotide addition, washing and signal detection.
what are advantages of WGS compared to WES?
- SNVs, indels , SV and CNVs in coding and non-coding regions ~3.5 million variants (WES omits promoters and enhancers & limited to coding and splice variants ~20 000 variants)- more uniform coverage- easier to capture low sequence complexity- pcr amplification not required reducing GC bias- not limited by sequencing read length (WES needs smaller target probes)- no reference bias (WES preferentially enriches reference alleles at het sites producing false calls0- WGS captures everything whereas WES is limited to current targeted genes- wgs suitable for complex trait gene identification as well as sporadic phenotypes caused by de novo variants (WES suitable for highly penetrant mendelian disease gene identification)
what are advantages of WES compared to WGS?
NAME?
what is targeted NGS?
used for disease-specific targeted tests for hereditary disorders and therapeutic decision making. Uses gene panels which are specific to certain disease types eg. clinical exome or mendeliome or custom-designed panels. only known genes included with established phenotype. can be used for tumour profiling, MRD (can see emergence of clones and allelic ratios), microbiology (disease outbreak, resistance, screening), NIPT and NIPD,
what are the advantages of targeted NGS over WES/WGS?
NAME?
what is the main disadvantages of targeted NGS over WES/WGS?
NAME?
what is a virtual exome and what are the advantages?
sequencing an exome and masking all but the desired data. reduces incidental findings, gives flexible analysis and addition of genes at no extra cost. can analyse primary genes first, then broader analysis if negative
what are the disadvantages of a virtual exome?
coveragedepth is sacrificed for breadth
how is NGS used for ct-DNA?
mutations on ct-DNA can act as a cancer biomarker to identify cancer patients from a group of healthy individuals. more sensitive than tissue biopsy. eg. SEPT9 methylation has been approved by FDA for blood-based screening test for CRC. NGS can also be used for treatment, selction, prognosis and MRD monitoring of ctDNA
how is NGS used for HLA typing?
knowledge of pilys in individuals in the HLA region is essential for organ and stem-cell transplantation
What are the 4 NGS methods of detecting CNVs?
- read-pair - able to identify almost all types of SVs but it is unable to detect the exact breakpoints. accuracy of RP methods is largely dependent on the insert size. poor performance for dups2. split reads - detect the exact breakpoints of SVs >1 . poor performance for dupskb 3. read depth - RD is more reliable for regions with deletions and duplications and can also count the number of CNVs but difficult to identify the exact breakpoints in RD. enriched in segmental duplications 4. Assembly- poor performance for dups
what is a benefit of using NGS for CNV instead of MLPA/array?what is the disadvantage?
MLPA /array is costly (array) and time-consuming and only a subset of genes tested (MLPA)NGS is high resolution, genome wide, provides positional info, detects UPD and LOH, high throughput, detects balanced and unbalanced rearrrangementsdetection of large rearrangements such as copy-number variants (CNV) from NGS data is still challenging due to issues intrinsic to the technology including short read lengths and GC-content bias. need to confirm CNVs. the challenge is to identify a tool able to detect CNVs from NGS panel data at a single-exon resolution with sufficient sensitivity to be used as a screening step in a diagnostic setting
what is a phred score?
quality. >30 is good30 = 1/1000 error rate so 99.9% accuracy20 = 1/100 99% accuracy10 = 1/10 90% accuracy
what is a basic bioinformatic pipeline process?
quality control > alignment (data mapped to reference) > variant calling > annotation
why is it important that reads are aligned correctly?
NAME?
why is cluster density important?
• Low cluster density can give very high quality data but causes a lower depth of coverage. Higher cluster density gives a better depth of coverage but can lead to lower quality reads. If cluster density is too high, the clusters become difficult to read and data can be lost.
what is a FASTQ file?
text-based format for storing both a nucleotide sequence and its corresponding quality scores. This is generally the input for most bioinformatic pipelines.
what is a BED file?
text file format used to store genomic regions as coordinates
what is a BAM file?
aligned/mapped reads and associated quality informationA BAM file (or Binary Alignment Map) is a binary format for storing sequence data. Once a set of FASTQs have been aligned to a reference genome using an alignment algorithm, it forms a BAM file. These can be used in the analysis process to visualise variants or to check quality/coverage of an area. BAM files in IGV
what is a CRAM file?
NAME?
what is a BCL file?
base calls per cycle, a binary file containing base call and quality for each tile in each cycle. The raw file produced by Illumina platforms (other than MiSeqs). These must be converted into FASTQs for bioinformatic analysis.
what things affect NGS alignment quality?
NAME?
give examples of annotation in the bioinformatics pipeline?
gene symbols, the transcript exon numbers, HGVS nomenclature and the variant consequence
why are quality steps used for in bioinformatics?
NAME?
why are paired-end reads better for repetitive regions and structural rearrangements eg. insertions, deletions and inversions?
the distance between each paired read is known and alignment algorithms can use this info to map the reads over repetitive regions more precisely
why is read length important?
if too short they will not accurately align. Long reads good for structural variation, repetitive STRs and pseudogenesLonger reads can provide more information about relative locations of specific base pairs. However, long read technology is expensive and is currently not common place in the NHS. Oxford nanopore long-read technology is becoming more affordable but currently has an error rate that would be considered too high in most diagnostic settings.
how can you validate a bioinformatic pipeline?
- assess sensitivity and specifity against genome in a bottle- at least 10 individuals (not genome in a bottle alone)- sensitivity > 0.95- 3 independent runs for reporducibility- all validation samples should be downsampled to test limit of detection eg. 20x, 30x- specificity >0.95- known sanger confirmed insertions, deletions and delins should be ran through pipeline to assess complex variants-
how would you select genes for a panel?
NAME?
what different types of target enrichment are available for NGS?
NAME?
how do you include transcripts in design of a new NGS panel?
- Alamut contains transcripts that encompass all required exons for a gene- NGS validation should include justification for selecting a transcript. if 2 transcripts have something unique (eg. unique exons) both can be joint together in the BED file- LRG is universally accepted reference standard containing fixed section and updatable section where biological info can be updated- list of transcripts fed into software using BED file with ROIs. these are tiled with RNA baits.- ROIs checked on alamut to ensure they span exon +- 50 bases- pseudogenes may result in poorer tiling across some regions. During mapping of reads, more than one alignment usually results in bot being discarded by mapping software and so may need to sanger-fill.
what are the main steps of designing an ngs panel?
- target enrichment2. gene selection3. transcript selection4. design - ROIs tiled with RNA baits5. DNA quality checks6. barcoding samples - allows multiplexing which decreases cost7. virtual panels? sub-panel analysis eg. HCM within CM panel8. polymorphism list - gnomad data can be excluded. should be reviewed and updated
describe validation for an NGS panel?
- required on all aspects of the testing process including method, sequencing and analysis- need to understand technical weaknesses eg. homopolymer tract errors, - need to assess reproducibility and robustness eg. horizontal coverage, 3 independent runs for validation samples, run-to-run comparisons helps to determine level of multiplexing for adequate coverage, include positive controls. quality scores per base or read depth should be monitored- sensitivity = - <5% error rate at 95% confidence which requires 60 unique variants compared in new method in an independent blinded analysis- validation should be documented in laboratory-controlled document system - UKGTN requires that new panels and addition of genes to existing panels should be validated using a ‘known normal control’ from the 1000 genomes project.
describe IQC and EQA for panel validation?
NAME?
do NGS variants require confirmation according to BPG?
NAME?
what should be included in an NGS report according to BPG?
- ACGS standards,- sequence data and clinical info- HGVS reporting- diagnostic yield for negative reports- panel, reference sequence, OMIM#, splice & promoter ROI, method used including library prep, analyser and bioinformatics pipelines and software, coverage, VUS with clinical relevance, ?secondary findings according to local policy, dosage
how can NGS costs be reduced?
NAME?
what are challenges of NGS for counselling?
- clinical utility?- VUS, incidental findings, variable penetrance, lack of literature- lack of data sharing (however CVA is good reource)- ethical issues for relatives- resources for resequencing, VUS follow-up, counselling and medical follow-up- detection rates need to be weighed against risks of VUS (50-100 het variants per patient)- more errors- previously pathogenic variants will need to be downgraded as true variants are found- risk estimates difficult for polygenic diseases- negative reports - but still useful to rule out a diagnosis
what should pre-test genetic counselling involve?
NAME?
what should post-test genetic counselling involve?
NAME?
what is 3rd generation sequencing? how does it compare to 2nd generation?
sequencing single DNA molecules without PCR amplification. 3rd gen is higher resolution generating over 10 000 bp reads and are better at detecting structural variants
what are the advantages of 3rd generation sequencing?
- small amount of starting material- higher throughput - hundreds to millions of reactions carried out- lower cost per base- longer read lengths >10 000 bp giving better mapping, phasing, CNV detection, insertions, dels and translocations, novel alternate splicing isoforms, chimeric transcripts-de novo assembly (without ref sequence)- better for repetitive sequence- better for pseudogenes- more uniform coverage and less sensitive to GC-content- potential to detect epigenetic modifications such as methylation
what are the 3 types of 3rd generation technologies?
NAME?
describe 3rd generation sequencing by synthesis and give examples? eg. Pacific Biosciences SMRT
- directly reads original DNA molecule instead of polymerase that copies a DNA strand-eg. Single molecule real time (SMRT) sequencing:- single molecule template per well- polymerase incorporates fluorescent NTs which is visualized with a laser and cameraADVANTAGES: fast, template sequenced multiple timescan detect methylated basesDISADVANTAGES: expensive and limited throughputdetermine large scale sequence structure of DNA without sequencing every baseEg. FRET sequencing (life technology) fluorescence resonance energy transfer
describe 3rd generation nanopore sequencing?
- A single DNA molecule is threaded through a nanopore (biological or synthetic) and individual bases are detected as they pass through the nanopore- detects up to 200kb- each base alters the current to a different degree
what is Synthetic Long Read 3rd gen sequencing? eg. Illumina
NAME?
what is 3rd generation mapping?
determine the large-scale sequence structure of DNA without sequencing every base. eg. BioNanooptical mapping system using fluorescently tagged probes attached at “nicked” restriction digest sites to fingerprint long DNA molecules. maps can be compared to a sequence assembly to construct scaffolds of how the sequences should be ordered and oriented along the chromosome, or compared to a reference genome to reveal structural changes, e.g. rearrangement/fusion of two chromosomes
give examples of referrals for which a karyotype may be needed?
NAME?
why might a balanced translocation carrier have a phenotype? how can you investigate this?
- submicroscopic imbalance - – can be investigated with FISH/Array-CGH/Optical genome mapping (OGM)eg. Miller-Dieker Syndrome (MDS)- 17p13.3 deletion- Type 1 lissencephaly with facial dysmorphism. Patients with isolated lissencephaly had smaller deletions. LIS1 gene identified, is deleted in the disease2. gene disruption such as inversioneg. GOF - splicing exons together creating novel chimeric gene such as BCR-ABL1 translated into tyrosine kinaseeg. LOF - coding sequence disrupted in haploinsufficient gene such as DMD in x;autosome translocationconstitutional translocations may give cancer risk if TSG is disabled or oncogene separated from controlling region eg. RUNX1 disruption can give rise to Familial Platelet Disorder with predisposition to AML, MDS. often a second RUNX1 hit leads to leukaemia progression. - identified by sequencing, FISH, rna sequencing, RNA acgh3. gene separated from cis regulatory elements such as promoter
how can autozygosity mapping identify a disease gene?
NAME?
what are potential issues with autozygosity mapping?
homozygous regions unrelated to disease locus andinflated LOD scores due to underestimating inbreeding extent
what NGS methods can be used to identify disease genes.
- targeted panels for clinically defined heterogeneous disease. requires known candidate genes, 100% coverage- WES - 2% of genome. useful for genetically diverse cases or multiple inheritance patterns. less biased approach to targeted NGS, cheaper than WGS and quicker to analyse however non-coding regions not covered, rarely 100% coverage due to poor enrichment & mapping issues, poor coverage of repetitive and GC-rich regions, not as good at detecting structural variantion- WGS - unbiased, includes non coding regions, fewer GC and repetitive regions bias, detects balanced chromosomal rearrangements and mosaic variants. HOWEVER it is costly, limited coverage of STRs and storage, security and sharing data issues.
what should a pipeline take into account for filtering NGS variants?
NAME?
what are limitations of NGS for gene discovery?
NAME?
what future possibilities are there of using NGS for gene discovery?
- RNA sequencing - validate resultsNGS-based methylation profiling- ChiP-seq - analyse protein interactions with DNA- gtex to look at gene expression in relevant tissues- more understanding of regulatory non-coding RNA’- improved data sharing- improved complex disease understanding eg. later onset and reduced penetrance
what is the calculation for posterior probability?
a/a+b where a = prior probability (CARRIER) x conditional probability of mutation not detected by test)b = prior probability (not a carrier) X conditional probability of mutation not detected by test)
what is the confidence interval?
gives an indication of how uncertain we are about that measurement with regards to the true population value, usually 95%. if we were to repeat an experiment 100 times and calculate the 95% confidence interval each time, then 95% of the intervals would contain the population mean.
what does it mean if the 95% confidence interval doesn’t span 1 for an odds ratio
there is statistically significant association between exposure and outcome
ADD TO CARDS how do you calculate the odds ratio?
outcome status + -exposed status + a b - c dWhere:a = Number of exposed casesb = Number of exposed non-casesc = Number of unexposed casesd = Number of unexposed non-casesOdds ratio (OR)= (a/c)/(b/d) which can be re-written as ad/bcOR of > 1 suggests that the odds of exposure are positively associated with the adverse outcome compared to the odds of not being exposed
define test sensitivity? how do you calculate it
Sensitivity is the ability of a test to correctly identify individuals who are affected by a disease, (the true positive rate)True positives/true positives + false negatives
define test specificity? how do you calculate it
the ability of a test to correctly identify individuals who are not affected by a disease, (the true negative rate)true negatives/true negatives + false positives
how to you calculate positive predictive value? (PPV)
true positives/true positives + false positives
how to you calculate negative predictive value? (NPV)
true negs/true negs + false negs
how does disorder prevalence affect PPV and NPV of a test?
higher prevalence means higher PPV and lower NPV
what might a dosage quotient outside of defined range indicate? how can this be checked?
NAME?
what is a polygenic score?
sum of the number of trait-associated alleles in an individual weighted by per-allele effect sizes from a discovery GWASquantifies an individual’s genetic predisposition to a trait
give an example of a risk prediction model
BOADICEA (Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm)- FH- lifestyle- rare pathogenic variants- polygenic risk score- mammography density
what are limitations of polygenic risk scores?
NAME?
how does CRISPR-Cas9 work?
REF!
what are The three main delivery strategies that could be used for clinical genome-editing applications ?
NAME?
what are the limitations of CRISPR-Cas9 gene editing?
- Accuracy - the ratio of on- versus off-target genetic changes- precision - the fraction of on-target edits that produce the desired genetic outcome- has the potential to create rearrangements that lead to cancer- • An immune response to bacterially derived editing proteins- pre-existing antibodies against CRISPR components to cause inflammation -• unknown long-term safety and stability of genome-editing outcomes
what are possible ethical controversies of germline gene-editing?
NAME?