mixed SAQs Flashcards
a) Name three file types used in an NGS analysis pipeline (3)
3 from: FASTQ
BAM or SAM or CRAM
VCF
BED
b) For each of these file types describe their contents and use. (6) fastq, bam, vcf, bed
FASTQ- Text file containing sequence reads and associated quality information
Standard format containing all reads from sequencing. Can be analysed to generate quality metrics, and used as input for read alignment tools.
BAM or SAM or CRAM- aligned/mapped reads and associated quality information
Output of read alignment. Can be analysed to generate quality metrics.
VCF - data lines containing information about a position in the genome, usually variants. May also include annotations
Output of variant calling. Annotations may be added prior to variant filtering and analysis.
BED - Genomic regions (chromosome, start and end)
Used to define the regions of interest for the assay.
c) NGS analysis often involves aligning short DNA sequences (reads) to a reference genome. Give two reasons why a read might not align correctly to the reference. (2)
Two from:
Read maps to multiple locations in the reference genome (e.g. pseudogene)
Reference genome is incomplete so sequence is missing (e.g. centromeric regions)
Errors introduced during sequencing
Variants in the sequence compared to reference
d) Reads that do not map uniquely to the reference genome (i.e. map to more than one location) are given a mapping score of 0 and may be excluded from downstream analysis. Explain possible reasons for non-unique mapping and what impact this might have on the clinical use of NGS. (3)
Duplicated regions of the genome (segmental duplications, pseudogenes) can result in the same sequence being present in 2 or more locations in the genome. NGS sequence reads that map to these duplicated regions will not have unique mapping and therefore may be removed from downstream analyses. If clinically relevant genes have a pseudogene it may be difficult to get sufficient coverage of the gene for variant calling. Alternatively, called variants may be in the pseudogene and not the gene itself. An alternate method may be required to confirm results in these genes such as long range PCR.
e) Give an example of a gene and an associated genetic disorder that might be difficult to analyse by NGS because reads do not map uniquely to the reference (2)
Possible examples: SMN1 and Spinal Muscular Atrophy or PMS2 and Lynch Syndrome
(both have pseudogenes)
Briefly describe paired-end sequencing and explain the advantages of paired-end over single-end sequencing for detecting variants associated with human disease. (4)
paired-end sequencing- Sequence both ends of the DNA fragment.
Paired-end sequencing can be useful for detecting structural variants (deletions, insertions or inversions)- read pairs mapping to different locations in the genome give information about the position of that sequence. This is not possible with single-end sequencing. Structural variants are a common cause of genetic variation and therefore genetic disease.
Describe the underlying genetic cause of fragile x?
FRAX is an X-linked recessive triplet repeat expansion disorder caused by a CGG
repeat expansion within the 5’ UTR of the FMR1 gene on the X-chromosome. When
the triplet repeat expands beyond a threshold (>200 repeats), this causes
hypermethylation of the FMR1 promotor and silencing of the gene
describe PCR for sizing?
The sizing PCR is a standard PCR with a F & R primer (one of which is fluorescently
labelled). Products are separated by capillary electrophoresis and sized against a
molecular ladder.
describe TP-PCR
TP-PCR uses F & R primers (again one of which is fluorescently labelled) and also a third
primer which is specific to the triplet repeat. The third primer is added in a limited
manner so that it is exhausted in early PCR rounds. This is to avoid preferential
amplification of smaller alleles. The products from the TP-PCR are also separated by
capillary electrophoresis and sized. A full expansion allele gives a classic ‘ski-slope’
pattern which tails off towards the larger end of the repeat.
a) List three differences between the nuclear and mitochondrial genomes
The mitochondrial genome is a fraction of the size of the nuclear genome (~16.5kb)
The mitochondrial genome is a small circular molecule
Mitochondrial DNA is maternally-inherited only.
Mitochondrial has no introns and very few genes ~37
Describe the inheritance patterns associated with mitochondrial disease
Mitochondrial disease can be caused by pathogenic variants in the mtDNA itself (maternally
inherited only) or by pathogenic variants in nuclear genes involved in mitochondrial DNA
maintenance which can be autosomal dominant or recessive
Define the term heteroplasmy and homoplasmy and mitochondrial bottleneck
Heteroplasmy – where two or more different variants of mtDNA exist within a cell
Homoplasmy – where all copies of the mtDNA are identical within a cell.
Mitochondrial bottleneck – a random shift of mtDNA mutational load between generations
(and even siblings) due to unequal transfer of mtDNA molecules during oogenesis
Describe 3 considerations for interpretation of pathogenicity unique to mtDNA variants
There are currently no mitochondrial DNA specific guidelines for interpreting variants.
Inheritance pattern (maternal or nuclear)
Population databases used (Mitomap instead of gnomAD for example)
check heter/homoplasmy levels in proband vs mum – if homoplasmic variant inherited from homoplasmic unaffected mum its unlikely to be disease- causing
Clinicians have referred an adult presenting with optic neuropathy to the highly specialised mitochondrial diagnostic service. Describe the appropriate testing pathway and any relevant candidate genes and variants for targeted analysis
Optic neuropathy is a generic term and can be caused by pathogenic variants in mtDNA
(such as Leber’s hereditary optic neuropathy (LHON)) or nuclear DNA. There are common
LHON variants which can be easily identified/excluded such as m.11778G>A (MT-ND4),
m.3460G>A (MT-ND1) and m.14484T>C (MT-ND6).
If these are negative, full gene screens can commence for each of the above three
mentioned genes.
f full gene screens are negative, a nuclear based eye panel may be appropriate.
Name the gene responsible for encoding mitochondrial DNA polymerase
POLG (polymerase gamma)
What is copy number variation?
A loss or gain of a region of the genome (could be single exon, multi-exon, whole
gene or multiple genes).
What types of genetic/genome abnormalities can oligoarray NOT detect
Uniparental disomy
Balanced translocations
Triploidy
Describe the differences between a SNP and oligo array?
An oligo array uses the patient and a sex-matched control sample which compete for
hybridisation to the probes on the array slide. The patient and the control DNAs are
labelled in different fluorescence and the captured image is converted to show if the
patient has a gain or loss compared to the control sample.
SNP arrays use thousands of known SNP positions across the genome and each SNP
is genotyped into AA homozygotes, BB homozygotes and AB and BA heterozygotes.
The patient is genotyped at each SNP position which is used to calculate the ratio of
AA, BB, AB and BA SNPs at each position and determine the copy number by the
ratio of heterozygous and homozygous SNPs
Briefly explain the use of the 3 resources/databases that you would use to aid interpretation of the clinical significance of a copy number change.
Database of Genomic Variation (DGV) – the DGV ‘gold standard’ track provides
information on the frequency of your copy number variant in the population. For
example, a CNV with a frequency of 0.80% in a population of 17,000 would be too
high to be disease causing.
ClinGen – This resource provides information on dosage pathogenicity and gives a
haploinsufficiency score and a triplosensitivity score for each gene in the CNV call.
For example a haploinsufficiency score of 3 would automatically make the CNV
pathogenic.
Decipher – Large database with national patient cohort. This can be used to
determine if your CNV has been seen before, the phenotype of the patient/s with
this CNV, the reporting laboratory and any overlapping features with similar
patients.
Briefly describe a known microdeletion syndrome region involving chromosome 16; include location, key gene(s) involved and provide two clinical features.
16p11.2 microdeletion syndrome which includes the TBX6 gene. Patients with this
disorder have intellectual disability, developmental delay and some also have autistic
behaviours.
Many newly described microdeletion or microduplication syndromes detected by
microarray are subject to reduced penetrance and variable expressivity. Define these
terms.
Reduced penetrance – Not all people with the genetic change will display the
features associated with that disorder.
Variable expressivity – The phenotype of the disorder is variable amongst
individuals, even those within the same family.
a) Give 3 clinical features of Prader Willi syndrome.
Intellectual disability
Obesity
Hypotonia in infancy
Hyperphagia
Overgrowth
strabismus
Give 3 clinical features of Angelman syndrome.
Seizures
Characteristic hand movements
Inappropriate laughter
AS PWAS Define the chromosomal region associated with these conditions.
15q11.2-15q13