18.02.23 Finding disease related genes using NGS Flashcards
Give an overview of rare disease. Which techniques and projects have already made a significant contribution to the detection of disease-causing gene?
Approx 7000 rare diseases (affecting <1 in 2000 individuals), ~80% of these likely to have a genetic cause (Wright et al 2018), but no genetic cause has been identified in many patients.
Linkage analysis, array CGH and GWAS studies have contributed to the identification of many disease genes, but they have limitations e.g. large famililes
Large research projects have also utilised (or will utilise) NGS to identify new genes associated with human disease, e.g.
DDD project (Wright et al 2015)
100,00 Genomes project (UK)
What are the advantages of WGS?
WGS can target whole genome including non-coding regions
More complete results
Reduces issues with eg GC rich/repetitive regions
Can detect all variant types (some methods still in development)
Single nucleotide variants
Copy number variants
Structural variants including balanced rearrangements not detected by array (see Harripaul et al 2017)
Short tandem repeat detection is improving (though this is currently a limitation)
Mosaicism can be detected if coverage is good enough
Can sequence a cohort of affected individuals or use trios and apply filtering and statistical analysis to identify likely variants/genes
What is WES?
Sequencing all of the protein coding regions of the genomes (1-2% of the total genome)
Most common method used to date
What are the advantages of WES?
Less biased than a targeted approach
Cheaper than WGS (and quicker to analyse as less data)
What are the disadvantages of WES?
No coverage of the non-coding regions
Coverage of repetitive regions, GC rich regions etc can be poor, therefore risking false negative results.
Not as good for detecting all types of variants (eg structural variation)
What are the advantages of WGS?
Sequence the whole genome without bias, Including non-coding regions
Can detect all types of variant (though calling eg STRs is not currently done routinely and requires highly specialised variant calling strategies)
Includes balanced chromosomal rearrangements (e.g. Schluth-Bolard et al 2013)
What are the advantages of WGS?
More data generated, which can be costly to analyse
Approximately how many variant calls are generated through WES/WGS methods?
NGS methods, particularly WES/WGS generate large numbers of variant calls (4-5million variants per person, ~30,000 coding variants, Wright et al 2018).
The analysis strategy used in gene discovery will depend on the sample data set (eg trio vs proband only cohort) and the prior knowledge about the likely mode of inheritance, penetrance etc.
Give examples of how variants can be filtered.
Quality metrics MAF Mutation type e.g. LoF Genes involved in a particular biological pathway Protein-protein interactions
For what expected inheritance patterns is it useful to sequence siblings?
Autosomal recessive – affected siblings are sequenced to identify shared variation and compound heterozygosity is expected in the absence of consanguinity or occurrence in an isolated population.
Consanguineous autosomal recessive –affected siblings are sequenced to identify shared homozygous variants.
What strategy is useful for identifying a gene associated with X-linked recessive disease?
The favoured strategy is to analyse the two most remotely related male family members. Autosomal variants can be disregarded.
What strategy is useful for identifying de novo dominant mutations?
Analysis of data from unaffected parents-affected child trios generally produces a handful of de novo variants for further analysis; comparison of these variants between as few as two families will generally reduce these to a single candidate gene.
How can mosaic mutations be identified?
The comparison of sequence data from a patient’s affected and unaffected tissue is frequently sufficient to identify de novo mosaic disease-causing mutations.
Why are statistical methods often not reliable for identifying rare causative variants from NGS data? What specialist statistical methods exist to help identify disease-causing genes?
CAST (Cohort Allelic sums test) compares the total extent of rare variation in a specific gene among patients and controls.
CMC (combined multivariate and collapsing method) or WST (weighted sums test): weighted to account for the fact that large genes have more chance to accumulate rare variation.
SKAT (sequence kernel association test), the C‑alpha test, and EREC (estimated regression coefficient test): tests each variant for association with disease independently, then combines the results across multiple DNA sequences to identify disease-associated genes. This strategy allows for the fact that some variants might be protective and cancel out risk variants
What are the limitations of using a NGS-approach for gene discovery?
Interpretation of results is still challenging
Variants in non-coding regions difficult to interpret without substantial additional functional work- not always possible
Validating the role of novel genes. MacArthur et al 2014 proposed standards for assessing the likely pathogenicity of genes and variants (Table 1 from Macarthur et al):]
Cohort studies
Phenotypic data (HPO terms)
Incidental findings - careful consent required