19.02.23 Finding disease related genes using NGS Flashcards
Name 3 methods for gene discovery by NGS
1) Targeted panels
2) WES
3) WGS
What are targeted panels suitable for?
- Genetically highly heterogeneous but clinically relatively defined disease
- Ones that involve 100-200 genes, such as deafness, retinitis pigmentosa, X-linked intellectual disability, cardiomyopathy
What are the consideration for targeted panels?
- Requires list of candidate genes
- Usually provides close to 100% genomic coverage
- Requires knowledge of full systems biology involved in disease (including complete roles and redundancy of different genes/proteins)
What is WES suitable for?
- Genetically diverse cases and/or multiple patterns of inheritance, eg RP which is 25% autosomal dominant, 20% AR, 10% X-linked, with mutations in many genes for each inheritance pattern, further many RP genes have not been yet identified
- With WES you sequence all protein coding regions of genome (~20,000 genes) (1-2% of total genome)
What are the advantages and disadvantages of WES?
- Advantages:
1) Less biased than targeted approach
2) Cheaper than WGS
3) Quicker to analyse than WGS as less data - Disadvantages:
1) No coverage of non-coding regions
2) Not 100% length coverage, usually miss 5-15% of coding regions of interest
3) Might miss variants due to limitations of the technique, eg. Coverage of repetitive regions and GC rich regions can be poor, increasing likelihood of false negative results
4) Not as good as WGS for detecting all types of variants (e.g. structural variants)
What is WGS?
- Sequence entire genome including gene regulatory regions that do not directly encode proteins
- Then use filtering and statistical analysis to identify likely variants/genes
What are the advantages and disadvantages of WGS?
- Advantages:
1) No bias
2) Includes non-coding regions
3) Fewer issues with GC rich/repetitive regions
4) Detects all types of variant i.e. balanced chromosomal rearrangements not detected by array, and mosaic variants (if coverage sufficient) - Disadvantages:
1) More data generated - costly to analyse
2) Currently limited coverage of short tandem repeats
List the factors that affect the analysis strategy of NGS
- Quality
- Allele freq
- Assumed inheritance pattern
- Predicted consequence
- Specialist statistical tests
- Systems biology/pathway analysis
- Variable expressivity /incomplete penetrance / environmental factors
Analysis strategy - Quality
- Variants initially filtered according to quality (e.g. minimum depth, variant call quality)
Analysis strategy - Allele frequency data
- used to filter out variants present at high frequency in “normal” population (i.e. gnomAD)
- Useful for rare disease - caution required when using for more common/non-mendelian disorders/multi-hit processes i.e. cancers
Analysis strategy - Assumed inheritance pattern
- influences selection and number of individuals to sequence, as well as analytical approach:
1) AR – affected siblings sequenced to identify shared variation i.e. homozygous variants for consanguineous families; compound heterozygosity in absence of consanguinity/occurrence in isolated population
2) XLR - analyse two most remotely related male relatives; autosomal variants disregarded
3) AD – mapping of gene to discrete chromosomal region (i.e. <2 Mb) may allow gene identification from analysis of single individual; larger genomic regions/ diseases not mapped require analysis of greater number of individuals
4) De novo dominant variants – analysis of unaffected parents-affected child trios identifies handful of de novo variants for further analysis; comparing variants in ≥2 families will often yield single candidate gene.
5) Mosaic variants – comparing sequence data from patient’s affected and unaffected tissue usually sufficient to identify de novo mosaic disease-causing variants
Analysis strategy - Predicted consequence
- variant may be filtered/stratified/prioritised according to predicted consequence (i.e. LOF, missense, splice variants, intronic etc)
- Many strategies initially focus on predicted LoF variants as these are assumed to have greatest function impact
Analysis strategy - Specialist statistical tests
- identify gene association with disease
- only assess gene variations in aggregate - do not aid interpretation of pathogenicity of any specific variant
- provide evidence of whether gene under investigation is involved in disease being studied)
1) CAST (Cohort Allelic sums test) compares total extent of rare variation in specific gene among patients and controls
2) CMC (combined multivariate and collapsing method) or WST (weighted sums test) = factors in gene size i.e. large genes more likely to accumulate rare variation
3) SKAT (sequence kernel association test), the C‑alpha test, and EREC (estimated regression coefficient test)= tests each variant for association with disease independently, then combines results across multiple DNA sequences to identify disease-associated genes (allows for fact that some variants might be protective and cancel out risk variants)
Analysis strategy - Specialist statistical tests
- variants filtered for genes in known biological pathways and/or protein-interactions with other genes previously associated with phenotype
How do we validate the role of novel genes?
- MacArthur et al (2014) proposed standards for assessing likely pathogenicity of genes and variants
- Gene burden - does it show excess of rare de novo probably damaging variants that segregate in cases and not controls?
- Protein interactions - does the gene product interact with proteins that have previously been implicated in the disease?
- Biochem function - does the genes product perform biochem function shared with other known disease genes?
- Expression - is the gene expressed in tissues relevant to the disease of interest? or does it show altered expression in people with the disease?
- Gene disruption - is the gene or gene function is altered in people with the disease?
Rescue - can the cellular phenotype in patient-derived cells be rescued by addition of WT product?