Session 2 Flashcards
How do you calculate odds ratio?
Why is it useful?
OR = (number affected with variant/number unaffacted with variant)/(number affected without variant/number unaffected without variant)
What is sensitivity?
How do you calculate?
Sensitivity is the ability of a test to correctly identify individuals who are affected by a disease, (the true positive rate)
True positive/(true positive+false negative)
What is specificity?
How do you calculate?
Specificity is the ability of a test to correctly identify individuals who are not affected by a disease (the true negative rate)
True negative/(true negative+false positive)
What is PPV?
How do you calculate?
Positive predictive value (PPV)= The proportion of positive tests that are true positives
True positive/(true positive+false positive)
What is NPV?
How do you calculate?
Negative predictive value (NPV) = The proportion of negative tests that are true negatives
True negative/(true negative+false negative)
What is MLPA?
Outline the principle
Multiplex Ligation-dependant Probe Amplification
DNA is hybridised to two probes- Each has universal primer for fragment amplification but one also has stuffer sequence to make fragments different length. Probes bind directly beside each other and a ligase fills the gap. Rounds of PCR then amplify up using the universal primers to make fragments of different sizes corresponding to region of interest. Relative peak heights to controls and reference probes used to detect CNV
What is MS-MLPA?
Outline the principle
Methylation specific Multiplex Ligation-dependant Probe Amplification
Similar to MLPA mostly - one tube will have MLPA normal to CNV. Other will be treated with a methylation-specific endonuclease after ligation - and the unmethylated DNA will be cut to stop amplification of that fragment. Can then calculate dosage of methylated sequence - to detect UPD
What are the common file types from bioinformatics pipeline?
BCL - raw file prdocued by Illumina sequencer. Has base call per cycle for each tile on cell
FASTQ - text based format of nucleotide sequence and quality
BAM - FASTQ files aligned to reference genome
CRAM- compressed BAM
VCF - most basic Variant calling from BAM
Annotated VCF - VCF with extra useful annotations
What is most commonly used to check quality of FASTQ?
FASTQC
What are some things measured by FASTQC?
Per Base Sequence Quality score
Per Sequence Quality Scores
Per Base Sequence Content
Per Base GC Content
Per Sequence GC Content
Sequence Length Distribution
Overrepresented Sequences
What are the basic steps of an NGS pipeline?
Demultiplex
Alignment
Variant calling
Annotation
Name an alignment tool
BWA
What type of variants can be detected by SR-NGS?
SNVs
Indels
CNV (with caller)
Structural (if coverage of breakpoints)
Name a tool for variant calling
GATK (better for SNV)
What is main problem with Roche sequencing?
Variance of signal intensity for a homopolymer length is large, resulting in high error rates in insertion and deletion (indel) calls
What is Phred Score?
What is considered high quality?
Phred scale score for the likelihood that a base has been called correctly.
Phred >30
Why are paired reads better than single end?
Identify the relative positions of various reads making it easier for resolving structural rearrangements such as gene insertions, deletions, or inversions
Improve the assembly of repetitive regions
What should be involved in a Bioinformatic pipeline validation?
Assess the pipeline’s output against the truth set eg Genome in a Bottle
Sensitivity should be calculated from at least 10 individuals and be >95%
Data must be collected over 3 independent runs for reproducibility
Confirm ability to detect known variants (all types needed for the testing)
What is library preparation?
Process of fragmenting DNA and adding adapters and idexes needed for sequencing.
For Exome/targeted panel an enrichment step is also required to cpature regions of interest (not WGS)
What are the two main types of enrichment method?
Amplicon
Hybridisation
How does Amplicon enrichment work?
What are benefits and drawbacks?
PCR amplification of regions of interest while adding adapters and indexes
Cheaper and faster but preferential amplification leads to non-uniform coverage and bias, can introduce artefacts and cannot be used for CNV analysis
How does Hybridisation enrichment work?
What are benefits and drawbacks?
Fragmentation and adapter/index ligation happens first. Then oligo probes designed to target regions of interest are bound. Beads are used to pull out bound fragments.
Achieves much more uniform coverage and true representation with different fragments. CNV calling is possible. BUT needs more DNA, costs more and has longer prep time