19.02.23 Finding disease related genes using NGS Flashcards

1
Q

Name 3 methods for gene discovery by NGS

A

1) Targeted panels
2) WES
3) WGS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are targeted panels suitable for?

A
  • Genetically highly heterogeneous but clinically relatively defined disease
  • Ones that involve 100-200 genes, such as deafness, retinitis pigmentosa, X-linked intellectual disability, cardiomyopathy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the consideration for targeted panels?

A
  • Requires list of candidate genes
  • Usually provides close to 100% genomic coverage
  • Requires knowledge of full systems biology involved in disease (including complete roles and redundancy of different genes/proteins)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is WES suitable for?

A
  • Genetically diverse cases and/or multiple patterns of inheritance, eg RP which is 25% autosomal dominant, 20% AR, 10% X-linked, with mutations in many genes for each inheritance pattern, further many RP genes have not been yet identified
  • With WES you sequence all protein coding regions of genome (~20,000 genes) (1-2% of total genome)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages and disadvantages of WES?

A
  • Advantages:
    1) Less biased than targeted approach
    2) Cheaper than WGS
    3) Quicker to analyse than WGS as less data
  • Disadvantages:
    1) No coverage of non-coding regions
    2) Not 100% length coverage, usually miss 5-15% of coding regions of interest
    3) Might miss variants due to limitations of the technique, eg. Coverage of repetitive regions and GC rich regions can be poor, increasing likelihood of false negative results
    4) Not as good as WGS for detecting all types of variants (e.g. structural variants)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is WGS?

A
  • Sequence entire genome including gene regulatory regions that do not directly encode proteins
  • Then use filtering and statistical analysis to identify likely variants/genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the advantages and disadvantages of WGS?

A
  • Advantages:
    1) No bias
    2) Includes non-coding regions
    3) Fewer issues with GC rich/repetitive regions
    4) Detects all types of variant i.e. balanced chromosomal rearrangements not detected by array, and mosaic variants (if coverage sufficient)
  • Disadvantages:
    1) More data generated - costly to analyse
    2) Currently limited coverage of short tandem repeats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the factors that affect the analysis strategy of NGS

A
  • Quality
  • Allele freq
  • Assumed inheritance pattern
  • Predicted consequence
  • Specialist statistical tests
  • Systems biology/pathway analysis
  • Variable expressivity /incomplete penetrance / environmental factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Analysis strategy - Quality

A
  • Variants initially filtered according to quality (e.g. minimum depth, variant call quality)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Analysis strategy - Allele frequency data

A
  • used to filter out variants present at high frequency in “normal” population (i.e. gnomAD)
  • Useful for rare disease - caution required when using for more common/non-mendelian disorders/multi-hit processes i.e. cancers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Analysis strategy - Assumed inheritance pattern

A
  • influences selection and number of individuals to sequence, as well as analytical approach:

1) AR – affected siblings sequenced to identify shared variation i.e. homozygous variants for consanguineous families; compound heterozygosity in absence of consanguinity/occurrence in isolated population
2) XLR - analyse two most remotely related male relatives; autosomal variants disregarded
3) AD – mapping of gene to discrete chromosomal region (i.e. <2 Mb) may allow gene identification from analysis of single individual; larger genomic regions/ diseases not mapped require analysis of greater number of individuals
4) De novo dominant variants – analysis of unaffected parents-affected child trios identifies handful of de novo variants for further analysis; comparing variants in ≥2 families will often yield single candidate gene.
5) Mosaic variants – comparing sequence data from patient’s affected and unaffected tissue usually sufficient to identify de novo mosaic disease-causing variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Analysis strategy - Predicted consequence

A
  • variant may be filtered/stratified/prioritised according to predicted consequence (i.e. LOF, missense, splice variants, intronic etc)
  • Many strategies initially focus on predicted LoF variants as these are assumed to have greatest function impact
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Analysis strategy - Specialist statistical tests

A
  • identify gene association with disease
  • only assess gene variations in aggregate - do not aid interpretation of pathogenicity of any specific variant
  • provide evidence of whether gene under investigation is involved in disease being studied)

1) CAST (Cohort Allelic sums test) compares total extent of rare variation in specific gene among patients and controls
2) CMC (combined multivariate and collapsing method) or WST (weighted sums test) = factors in gene size i.e. large genes more likely to accumulate rare variation
3) SKAT (sequence kernel association test), the C‑alpha test, and EREC (estimated regression coefficient test)= tests each variant for association with disease independently, then combines results across multiple DNA sequences to identify disease-associated genes (allows for fact that some variants might be protective and cancel out risk variants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Analysis strategy - Specialist statistical tests

A
  • variants filtered for genes in known biological pathways and/or protein-interactions with other genes previously associated with phenotype
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we validate the role of novel genes?

A
  • MacArthur et al (2014) proposed standards for assessing likely pathogenicity of genes and variants
  • Gene burden - does it show excess of rare de novo probably damaging variants that segregate in cases and not controls?
  • Protein interactions - does the gene product interact with proteins that have previously been implicated in the disease?
  • Biochem function - does the genes product perform biochem function shared with other known disease genes?
  • Expression - is the gene expressed in tissues relevant to the disease of interest? or does it show altered expression in people with the disease?
  • Gene disruption - is the gene or gene function is altered in people with the disease?
    Rescue - can the cellular phenotype in patient-derived cells be rescued by addition of WT product?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the limitations of NGS for gene discovery?

A
  • Interpretation of results challenging (non-coding regions hard to interpret and functional assays are expensive and tissue samples not available)
  • Cohort size - normally just proband, trio or small families - need to data share between sites
  • Phenotype data (HPO terms) - this is essential
  • Incidental findings - Always a risk for WES and WGS - consent must be taken to account for risk
17
Q

What are the future directions and challenges?

A

1) Use of ‘omics’
- RNASeq - good for functional validation
- Epigenomics analysis - NGS based methylation profiling
- ChIP-seq
2) Availability of genome wide datasets to support interpretation of results
3) interpretation of non-coding variants is a major challenge - o improvements in understanding of roles of regulatory non-coding RNAs in human disease will support interpretation
4) Improvements in data sharing essential to support interpretation