Lecture 12: DNA STRUCTURAL VARIATION: COPY NUMBER VARIANTS (CNVS) Flashcards
What are structural variations…SV?
= 3
- Rearrangements of the DNA in a genome resulting in novel breakpoint junctional events
- Copy number neutral
* Inversion or balanced translocation
- Copy number neutral
- Copy number variants (CNVs)
* Deletion, duplication, triplication genomic segment on one chromosome of a homologous pair
- Copy number variants (CNVs)
DNA copy number variants (CNVs)
- CHROMOSOMAL DELETION = loss of genomic segment
- CHROMOSOMAL DUPLICATION = gain of genomic segment
what are CNVs? - copy number variants?
= 6
- CNVs were revealed following the completion of the human genome project
- Major source of genetic diversity
- Common ~5-10% of the human genome contributes to CNV
- Variable sizes 50bp – 3Mb
- Mostly BENIGN POLYMORPHIC structural variation (SV) with NO PHENOTYPIC EFFECT
- But can have a ROLE in HUMAN DISEASE
Phenotypes can range from COGNITIVE disabilities and CONGENITAL anomalies to PREDISPOSITIONS TO OBESITY AND CANCER
- But can have a ROLE in HUMAN DISEASE
Structural variation map of the human
genome
- CNVs are unevenly distributed in the genome;
- the pericentromeric and subtelomeric regions of chromosomes show a particularly HIGH RATE OF VARIATION
Role of genomic architecture on human genetic disease?
Structural features of the human genome or the genomic architecture, can result in REGION-SPECIFIC SUSCEPTIBILITY to
REARRANGEMENTS and thus GENOMIC INSTABILITY WHICH CAN RESULT IN HUMAN GENETIC DISEASE
DNA copy number variants and human genetic disease…
- DELETIONS and DUPLICATIONS are an important cause of HUMAN GENETIC DISEASE
— * Loss or gain of:
1 – Whole chromosome (aneuploidy)
2 – Several adjacent genes in a contiguous gene syndrome
(microdeletion/microduplication syndrome)
3– Single gene
4 – Exons (part of a gene)
=
GENOMIC DISORDERS
(size of disease decreases in list above)
Genomic disorders:
Phenotypes can arise from several molecular mechanisms: 6
1 ➢Gene dosage - altering the copy number of a dosage
sensitive gene
2. ◦ Haploinsufficiency
3. ◦ Triplosensitivity
- ➢Disruption of coding sequences e.g. exon deletion
- ➢Gene fusion event at the breakpoint generating a gain of function mutation
- ➢Perturbing long-range gene regulation i.e position effect
- ➢Deletion unmasking a recessive variant on the other allele
List Methods of Detecting CNVs… from Mbp to bp SIZE
Mbp (biggest)
1 * Karyotype
2 * FISH (metaphase & interphase)
3 * QF-PCR (Quantitative Fluorescent PCR)
4 * DNA microarray/ Array CGH
5 * MLPA
6 * NGS
bp
Quantitative Fluorescent PCR (QF-PCR)
what is it? what does it allow? what does it
=8
- Prenatal diagnosis of abnormal chromosome copy number (aneuploidies)
- Only a few aneuploidies are compatible with life.
- Chromosomes 13, 18, 21, X and Y
- Allows detection of approximately 85-90 % clinically significant chromosomal
abnormalities detected at birth.
5* Trisomy 21 (Down syndrome)
6* Trisomy 18 (Edwards syndrome)
7* Trisomy 13 (Patau syndrome)
8* Sex chromosome X and Y abnormalities (Turner 45,X or Kleinfelter syndrome 47,XXY)
- Allows detection of approximately 85-90 % clinically significant chromosomal
QF-PCR
* Aimed at pregnant…who? why? what? how? for? =7
1 * Aimed at pregnant women with increased risk of chromosome abnormality
…2 * Maternal age
…3 * Altered serum metabolites
…4 * Ultrasound abnormalities of the fetus
5 * Undergo invasive sampling of amniotic fluid or CVS
6 * Fetal DNA is obtained
7 * QF-PCR performed using a multiplex of fluorescent, polymorphic STR (short tandem repeat) markers from chromosomes 13,18,21,X and Y
QF-PCR analysis…8
1 * PCR products are size separated by capillary electrophoresis using a Genetic Analyser (DNA sequencer)
2 * Markers differentiated by size and fluorescent tag
3 * Peak pattern and area for each allele are analysed
4 * Copy number is determined from the relative quantitation of the STR markers
….5* Normal diploid copy number - 2 alleles
………6* If alleles are heterozygous then ratio 1:1
….7* Abnormal triploid copy number – 3 alleles
………8* ratio 1:2 or triallelic 1:1:1
Normal QF-PCR allele pattern
for chromosome 13, 18, 21
Abnormal QF-PCR allele pattern
trisomy 18, normal 13 and 21
Abnormal QF-PCR allele pattern
trisomy 21, normal 13 and 18
Marker
D21S135
Two peaks allele ratio 1:1
diagram on slides 15-17
Advantages to QF-PCR for prenatal
diagnosis = 7
1 * Accurate
2 * Robust: low failure rate
3 * Rapid: 24-48 hours
4 * Minimise patient anxiety
5 * Cost effective (compared to FISH as less labour intensive)
6 * Suitable to automation / Medium throughput
7 * Commercial kits available
What is a DNA microarray? = 7
1 *DNA chip
2 *A collection of DNA spots attached to a solid surface (usually glass)
3 *Thousands/millions of spots are arrayed in orderly rows and columns
4 *Each DNA spot contains a specific DNA sequence, known as a probe (oligonucleotide)
5 *Probes are used as hybridisation targets
6 *Probe-target hybridization is usually detected and quantified by detection of fluorophore
7 *Determination of the relative abundance of nucleic acid sequences in the target
Genomic DNA Microarrays ..understanding = 5
- Whole genome analysis
- Microarray Molecular Karyotype
- 1000 x greater resolution
- Higher diagnostic yield - 15-20%
- Traditional karyotype – ~3%
(excluding Down syndrome and other recognisable chromosomal syndromes)
Microarrays: First-tier genetic test in place
of karyotyping
for individuals with…
1 ◦ Developmental delay
2 ◦ Intellectual disability
3 ◦ Autism
4 ◦ Multiple congenital anomalies
Microarrays can be used for:
- Prenatal diagnosis
- Pregnancy loss
- cancer
Types of Chromosome Microarrays: 7
1 ▪Array CGH
…2 ▪ Genomic gains and losses
3 ▪SNP array
…4 ▪ Genomic gains and losses
…5▪ Copy neutral aberrations
…6 ▪ Regions of homozygosity (ROH)
…7 ▪ Regions of loss of heterozygosity (LOH)
Different resolutions of microarrays
- Manufacturers produce variety of different arrays
> number of markers > genomic coverage & > resolution
12 x 300K (Illumina)
8 x 850K (Illumina)
24 x 750K (Illumina)
1 x 2.6M (Affymetrix)
1 x 750K (Affymetrix)
2 x 400K (Agilent)
4 x 180K (Agilent)
8 x 60K (Agilent)
Different platforms available
- Affymetrix
- Agilent Technologies
- illumina
Types of Microarrays
Array-CGH vs SNP-Arrays =5
– Array-CGH
1 ▪ Patient genomic DNA compared to normal reference control DNA
2 ▪ Fragment DNA, fluorescently labelled in different colours and then competitively hybridized on to the array and read with a fluorescence scanner.
SNP-Arrays:
3 ▪ Do not use a control reference DNA.
4 ▪ Fragmented patient DNA is hybridised to the array.
5 ▪ SNP allele frequency and absolute fluorescence levels are compared to a standard consisting of averaged results for multiple normal samples.
Array CGH process = 5
- reference DNA + Test DNA
- Reference Control DNA labeled with Cy32.
- Patient Test DNA labeled with Cy5 - DNA labelling
- Co-hybridisation
- Signal Detection - Fluorescence ratio of
Cy3 to Cy5 is directly quantitative - Data analysis
Illumina SNP BeadArray technology = 6
- Each bead is covered with hundreds of thousands of copies of a specific oligonucleotide that act as the capture sequences.
- Each oligo binds to a complementary sequence in the sample.
- Single base pair extension matching the sample DNA ALLELE SPECIFICITY
- Fluorescence excitation by scanner and signal capture.
- Intensity values measured for each colour
- Allelic ratios calculated.
The Infinium Whole Genome Genotyping Workflow = 7
- Easy workflow
- 200ng input DNA
- 50-mer hybridization
- PCR-free protocol
- High reproducibility (>99.9%)
- High call rates (>99%)
- Everything you need comes in the box.
The Infinium Whole Genome Genotyping
Workflow = process..
- genomic DNA 200-400ng
DAY 1:
1. Make amplified DNA
2. Incubate amplifies DNA
DAY 2:
3. Fragment amplified DNA
4. Precipitate and Resuspend
5. Prepare BeadChip
6. Hybridise samples on BeadChip
DAY 3:
7. Image BeadChip
8. Genotype and LOH/CN analysis
Karyostudio software
image labelled of the software and what is in it on slide 34
NxClinical Software BioDiscovery
slide 35
Detection of CNVs - 2 ways
- LOG R RATIO (LRR)
- B ALLELE FREQUENCY (BAF)
- LOG R RATIO (LRR)
- Copy number
- Log R Ratio (LRR) is a normalised measure of the total signal intensity for the SNP.
- Any deviations from zero for LRR are evidence for a copy number change.
- B ALLELE FREQUENCY (BAF)
- BAF is a measure of the ‘allelic intensity ratio’
- ‘Proportion of hybridized sample that carries the B allele as designated by the Infinium Assay’
Detection of CNVs
DIAGRAM CN
CN = 2
NORMAL
CN=1
DELETION
CN =3
DUPLICATION
SNP microarrays
Regions of homozygosity (ROH)
DIAGRAM ON SLIDE 38
- No copy number change
- LogR 0.00
- No heterozygous BAF
ROH – Diagnostic implications
Uniparental disomy (UPD) VS. Identity by descent = 5
‘Uniparental disomy (UPD)’
…2◦ Usually a single large ROH (or a couple of large ROH) on same chromosome
…3.◦ May not overlap the imprinted region
‘Identity by descent’
…4◦ Multiple ROH across different chromosomes
…5.◦ Arises when close ancestry/isolated ethnic population
Microarray data analysis =2
Benign variant found in the normal population
OR
Clinically significant CNV associated with the patient’s phenotype
Database investigations
- DGV (Database of Genomic Variants) http://dgv.tcag.ca
◦ Compare patient CNV to normal individuals - DECIPHER https://decipher.sanger.ac.uk
◦ Compare patient CNV to other patient genotype/phenotype details - ClinGen https://clinicalgenome.org
◦ Resource that defines the clinical relevance of genes and variants
◦ Dosage sensitivity scores for curated genes/regions (HS and TS)
◦ Database of patient CNVs categorised by clinical significance - UCSC Genome Browser http://genome.ucsc.edu
◦ View RefSeq/OMIM genes and numerous other information tracks - PubMed http://www.ncbi.nlm.nih.gov/pubmed
◦ Research publications relevant to a particular phenotype/gene of interest - gnomAD https://gnomad.broadinstitute.org/
◦ Genome Aggregation Database with exome and genome sequencing data from large scale projects
◦ V2.1.1 data from 125,748 exomes and 15,708 whole genomes from unrelated individuals
◦ Various disease specific and population studies
Copy number variant (CNV) classifications = 5
- Pathogenic
* Variant contributes to the development of disease. - Likely Pathogenic
* High likelihood that this variant is disease-causing - Uncertain
* Not enough information at this time to support a more definitive classification of this variant - Likely Benign
* Not expected to have a major effect on disease; however, the scientific evidence is currently insufficient to prove this conclusively - Benign
* This variant does not cause disease
Reporting Microarray Results:
PATHOGENIC /
LIKELY PATHOGENIC CNV = 4
1 ➢Testing family members may be appropriate.
2 ➢Test parents for recurrence risk.
3 ➢Rule out balanced rearrangement
4 ➢Genetic counselling recommended
Reporting Microarray Results
CNV UNCERTAIN SIGNIFICANCE
=2
1 ➢Testing parents may be
helpful
2 ➢Genetic counselling may be
appropriate.
Reporting Microarray Results
- NO CNV DETECTED
= 2
- ➢Small sequence variants, balanced rearrangements, Low-level mosaicism not detected
2 ➢If a specific disorder is suspected then further testing may be appropriate.
Advantages of microarray compared to karyotyping = 5
- *Much higher resolution and therefore higher diagnostic yield
2 * i.e 10-15% more significant abnormalities detected in patients with intellectual disability, developmental delay, autism and multiple congenital abnormalities
3 *Cheaper than karyotyping because of ease of automation
4 *Robust technology
5 *Tissue culture not needed
Disadvantages of microarray = 5
1 *Unable to detect balanced rearrangements
2 *No positional information for a duplication – do not know if tandem or transposed elsewhere in the genome
3 *Can be time-consuming to analyse and report
4 *Many CNVs are novel with unknown clinical significance
5 *Can detect incidental findings (eg deletion of cancer suppressor genes)
Multiplex Ligation-dependent Probe
Amplification (MLPA) = 7
- Multiplex PCR method to investigate the copy number of ~60 targets in one MLPA reaction MLPA technique
- Wide variety of commercial kits available (>350)
- Routine diagnostic tests used globally in many laboratories
- Only requires a thermocycler and capillary electrophoresis equipment
- Up to 96 samples can be processed simultaneously
- Results available within 24 hours.
- Targeted to specific regions of interest NOT whole genome analysis
What MPLA probes = 4
- MPLA probe mixes have probes that target specific genomic sequences.
- An MLPA and RPO consist of 2 parts; a left and a right probe oligonucleotide (LPO and RPO).
- The LPO and RPO contain PCR primer and DNA hybridisation sequences.
- An additional stuffer sequence in the RPO gives each probe a unique length.
MLPA procedure …steps 1 to 5
- Sample denaturation and probe hybridisation
- In the first step, purified sample DNA is denatured. This is followed by overnight incubation with MLPA probe oligos. The LPO and RPO parts of each probe hybridise to immediately adjacent target DNA sequences. - Probe ligation
-The second step is the ligation of probe oligos that hybridised to immediately adjacent target sequences. No mismatches around the ligation site are permitted, making the ligation reaction highly specific. The number of probe ligation products is a measure for the number of target sequences in the sample. - Probe amplification
- In the third step, ligated probes are amplified in a multiplex PCR using a single universal primer pair. Only ligated probes are exponentially amplified, making the removal of unbound and non-ligated probes unnecessary. - Fragment separation
-The fourth step is fragment separation by capillary electrophoresis. PCR products are loaded onto a capillary electrophoresis device and separated by length. Each fragment corresponds to a specific MLPA probe.
Step 4: Separation by capillary electrophoresis
- each peak is the amplification product of a specific probe
- samples are compared to a control sample
- a difference in relative peak height or peak area indicates a copy of the probe-target sequence
……
- Data analysis
The final step is data analysis by Coffalyser.Net™. Relative copy numbers are determined by comparing the relative peak heights of reference probes and target probes in the test samples with those in reference samples with a known normal copy number. Advanced quality checks help to recognize unreliable data.
Structural variants (SVs) and
CNV detection by NGS?
diagram slide 56
Many publications with different algorithms
for detection of SVs by NGS
PROBLEM!!!
- Structural variants usually
occur in regions with complex genomic architecture: flanked by repetitive sequence elements (LCRs).
- Difficult to sequence.
Maybe misidentified or missed
entirely by existing methods
using short-read sequencing
Why is the detection of CNVs from NGS data not currently a routine test? = 8
1 *CNVs detected with lower confidence than SNVs/indels
2 *Inconsistencies among different methods
3 *Lack of a high-quality reference for CNVs from ES data
4 *Most algorithms use depth of coverage assuming read depth is linearly correlated with the underlying true copy number
5 *However, read depth in NGS is variable
…. 6 * sample batching, GC content, PCR duplication bias, targeted depth, sequencing efficiency, and mappability
7 *Difficult to differentiate between technical artefacts and the real signal for a
true copy number change.
8 *Detecting CNVs in polymorphic regions of the genome is challenging
Long-read sequencing?
slide 59