Session 2 Flashcards
How do you calculate odds ratio?
Why is it useful?
OR = (number affected with variant/number unaffacted with variant)/(number affected without variant/number unaffected without variant)
What is sensitivity?
How do you calculate?
Sensitivity is the ability of a test to correctly identify individuals who are affected by a disease, (the true positive rate)
True positive/(true positive+false negative)
What is specificity?
How do you calculate?
Specificity is the ability of a test to correctly identify individuals who are not affected by a disease (the true negative rate)
True negative/(true negative+false positive)
What is PPV?
How do you calculate?
Positive predictive value (PPV)= The proportion of positive tests that are true positives
True positive/(true positive+false positive)
What is NPV?
How do you calculate?
Negative predictive value (NPV) = The proportion of negative tests that are true negatives
True negative/(true negative+false negative)
What is MLPA?
Outline the principle
Multiplex Ligation-dependant Probe Amplification
DNA is hybridised to two probes- Each has universal primer for fragment amplification but one also has stuffer sequence to make fragments different length. Probes bind directly beside each other and a ligase fills the gap. Rounds of PCR then amplify up using the universal primers to make fragments of different sizes corresponding to region of interest. Relative peak heights to controls and reference probes used to detect CNV
What is MS-MLPA?
Outline the principle
Methylation specific Multiplex Ligation-dependant Probe Amplification
Similar to MLPA mostly - one tube will have MLPA normal to CNV. Other will be treated with a methylation-specific endonuclease after ligation - and the unmethylated DNA will be cut to stop amplification of that fragment. Can then calculate dosage of methylated sequence - to detect UPD
What are the common file types from bioinformatics pipeline?
BCL - raw file prdocued by Illumina sequencer. Has base call per cycle for each tile on cell
FASTQ - text based format of nucleotide sequence and quality
BAM - FASTQ files aligned to reference genome
CRAM- compressed BAM
VCF - most basic Variant calling from BAM
Annotated VCF - VCF with extra useful annotations
What is most commonly used to check quality of FASTQ?
FASTQC
What are some things measured by FASTQC?
Per Base Sequence Quality score
Per Sequence Quality Scores
Per Base Sequence Content
Per Base GC Content
Per Sequence GC Content
Sequence Length Distribution
Overrepresented Sequences
What are the basic steps of an NGS pipeline?
Demultiplex
Alignment
Variant calling
Annotation
Name an alignment tool
BWA
What type of variants can be detected by SR-NGS?
SNVs
Indels
CNV (with caller)
Structural (if coverage of breakpoints)
Name a tool for variant calling
GATK (better for SNV)
What is main problem with Roche sequencing?
Variance of signal intensity for a homopolymer length is large, resulting in high error rates in insertion and deletion (indel) calls
What is Phred Score?
What is considered high quality?
Phred scale score for the likelihood that a base has been called correctly.
Phred >30
Why are paired reads better than single end?
Identify the relative positions of various reads making it easier for resolving structural rearrangements such as gene insertions, deletions, or inversions
Improve the assembly of repetitive regions
What should be involved in a Bioinformatic pipeline validation?
Assess the pipeline’s output against the truth set eg Genome in a Bottle
Sensitivity should be calculated from at least 10 individuals and be >95%
Data must be collected over 3 independent runs for reproducibility
Confirm ability to detect known variants (all types needed for the testing)
What is library preparation?
Process of fragmenting DNA and adding adapters and idexes needed for sequencing.
For Exome/targeted panel an enrichment step is also required to cpature regions of interest (not WGS)
What are the two main types of enrichment method?
Amplicon
Hybridisation
How does Amplicon enrichment work?
What are benefits and drawbacks?
PCR amplification of regions of interest while adding adapters and indexes
Cheaper and faster but preferential amplification leads to non-uniform coverage and bias, can introduce artefacts and cannot be used for CNV analysis
How does Hybridisation enrichment work?
What are benefits and drawbacks?
Fragmentation and adapter/index ligation happens first. Then oligo probes designed to target regions of interest are bound. Beads are used to pull out bound fragments.
Achieves much more uniform coverage and true representation with different fragments. CNV calling is possible. BUT needs more DNA, costs more and has longer prep time
How does Illumina sequencing work?
Sequencing by synthesis
Flow cell covered in flow complementary for either end of fragments. Extension of fragment on lawn and round of bridge amplification leads to cluster formation.
Reverse strands are cleaved to leave only forward strands - for read 1. Sequencing primers bind and rounds of nucleotide addition extends the read with fluorescence corresponding to which nucleotide being read. The indexes are then read using index primers by same method.
Reverse strand is remade and forward removed. Process is repeated for read 2.
How does ion torrent sequencing work?
What are the disadvantages?
Template DNA is bound to beads and enriched - with each beads having its own well. Measures change in Ph caused by release of hydrogen during incorporation of nucleotide.
Relative poor performance at homopolymer regions. Higher rate of sequencing errors.
How does Roche 454 sequencing work?
What are the disadvantages?
Four nucleotides cyclically added and DNA polymerase releases pyrophosphate which results in chemiluminescent light signal
High reagent cost.
High error rates in homopolymer regions. Low capacity.
What are the advantages of WGS?
Allows examination of SNVs, indels, SV and CNVs in coding and non-coding regions of the genome
Detection of structural variants
WGS has more reliable sequence coverage
Coverage uniformity
WGS doesn’t suffer from reference bias
Can go back to re-analyse later
What is ChIP-Seq?
Chromatin immunoprecipitation followed by sequencing
Used for studying transcriptional regulation and epigenetic mechanisms
What are the advantages of RNA-Seq?
Look at single base changes, splice changes, gene boundaries and expression levels.
What is western blotting?
Used to detect presence/absence of a protein, compare protein levels, assess purity or estimate relative molecular mass.
What is qPCR?
Quantitative PCR/ real time PCR
Amplification of DNA is monitored in real time and there is simultaneous amplification, detection and continuous quantification of DNA templates during each PCR cycle using fluoresence. During the exponential phase of the reaction, the amount of product is directly proportional to amount of template.
What is RT-qPCR?
reverse transcription qPCR using cDNA from RNA samples
What are two types of detected chemistry for qPCR?
Non-specific fluorescent dyes that intercalate with any dsDNA - small molecules that when free in solution show very little fluorescence, but bound to the minor groove of increasing PCR products dsDNA its fluorescence increases
sequence-specific DNA probes consisting of oligonucleotides that are labelled with a fluorescent reporter - e.g. Taqman. Probe binds region of interest and is displaced by DNA polymerase during PCR which releases fluoresence.
When can quantification occur in qPCR?
Only during the exponential phase. Above the cycle threshold (number of cycles when fluoresence is above detectable threshold/background) but before the reagents start to become less available and the amount of amplified DNA starts to affect primer binding
What two types of quantification are used in qPCR?
Absolute quantification (standard curve analysis) - test sample Ct is plotted against the log of the standard concentration of different dilutions
Relative quantification - determine fold-differences in expression levels of the target gene against housekeeper. Removes possible dilution error but requires PCR efficiency for both target and housekeeper to be the same
What are some applications for qPCR?
MRD in haemonc
Single base mutation detection
SNP Genotyping
Genomic Copy Number Measurement
What are some difficulties of using RNA?
- Short half life
- Specialist extraction kits and reagents required
- Ultra-clean laboratory areas required
- Limited expression patterns may mean that the required tissue is not available for analysis
What methods can be used for DNA sizing?
Agarose gel electrophoresis
Polyacrylamide gel electrophoresis (PAGE) Pulse field gel electrophoresis
Capillary electrophoresis
Nanowire structures
Agilent Bioanalyzer
Southern blotting
What is triplet primed PCR?
Method for triplet/quad expansion sizing when expansion too large for conventual PCR.
Uses three primers (P1, P3, P4). P1 binds upstream of repeat. P4 binds repeats at different places to make different size fragments. P3 complementary for 5’ of P4 and can be used to amplify those fragments.
What is chimeric PCR?
Used for HD
Forward is before the expansion. Reverse is “chimeric” 5’ is complementary for sequence post repeat and 3’ is complementary to to 5 CAG repeats. Will bind at end of repeat and amplify fragment accordingly. Can detect expansions up to 101 (+/-1)
What is inverse PCR?
Give an example use
PCR used to detect inversions. Uses primers which usually face away from each other but in an inversion will face each other and work in PCR
E.g. Haemophilia A - factor VIII intron 22 inversion
When is southern blotting useful?
Outline the method
Detection of large fragments not amplifiable by PCR and investigating methylation status
Input DNS (large amount)
Restriction digestion to isolate fragment of interest
Gel electrophoresis to separate and then denature to make single stranded
Transfer DNA to a membrane (absorbent material soaks up buffer through the gel and membrane taking the DNA with it - positive charged material)
Fix DNA to membrane (e.g. baking or UV)
Add probe to label fragments
Wash unbound probe
Visualise fragments
What is a SNP?
DNA sequence change occurring commonly in the population
What SNP pattern is seen for normal diploid?
3 bands - 1 homozygous B (at 1 B allele frequency ), 1 het A/B (at 0.5 B allele frequency) and 1 homozygous A (at 0 B allele frequency )
What SNP pattern is seen for a duplication?
Only het SNPs will be affected so that band splits making 4 bands.
Het bands now sit at 0.666 (if B allele duplicated) and 0.333 (if A allele duplicated) B allele frequency
What SNP pattern is seen for a deletion?
Loss of an allele means will either be hemizygous B or A so loss of middle band at 0.5 B allele frequency
What SNP pattern is seen for a mosaic duplication?
Increasing mosaicism separates the heterozygous track further until makes 4 bands for a non-mosaic duplication
What SNP pattern is seen for a mosaic deletion?
Increasing mosaicism separates the heterozygous track further until makes 2 bands for a non-mosaic deletion
What does maternal cell contamination look like on SNP array?
Muddied allele frequencies but normal copy number by LogR
What SNP pattern is seen for a nullisomy?
Almost 0 LogR ratio and SNP assignment not in tracks but random and not coloured
What types of UPD can SNP array detect?
Isodisomy - in full or partial with some heterodisomy
What can Copy number neutral LOH indicate?
What can be calculated from this?
Consanguinity
Identity by descent
What is third generation sequencing?
Single molecule sequencing
What are the three categories of third generation sequencing?
Sequencing by Synthesis
Nanopore
Synthetic long read
Outline an example LR sequencing by synthesis method
Single molecule real time (SMRT) sequencing (developed by Pacific Biosciences)
- Template fragments are processed into circular DNA molecule.
- 1 DNA polymerase on bottom of each zero-mode waveguide (ZMW)
- Fluorescentt dNTPs added at high concentration then diffuse back up and exit the hole within microseconds. Nucleotides are only in the detection volume of the ZMW for microseconds, resulting in 100 fold reduction of background noise.
- When incorporated dTNP held in detection volume for a longer time
- Fluorescence detected and fluorophore attached to phosphate is cleaved
- Cycle is repeated
What are benefits and disadvantages of pacbio LR?
- very fast
- circular template allows sequenced multiple times to get consensus and less errors
- Methylation status
- Limited number of ZMW per cell
- Cost