18.02.23 Finding disease related genes using NGS Flashcards

1
Q

Give an overview of rare disease. Which techniques and projects have already made a significant contribution to the detection of disease-causing gene?

A

Approx 7000 rare diseases (affecting <1 in 2000 individuals), ~80% of these likely to have a genetic cause (Wright et al 2018), but no genetic cause has been identified in many patients.

Linkage analysis, array CGH and GWAS studies have contributed to the identification of many disease genes, but they have limitations e.g. large famililes

Large research projects have also utilised (or will utilise) NGS to identify new genes associated with human disease, e.g.

DDD project (Wright et al 2015)

100,00 Genomes project (UK)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of WGS?

A

WGS can target whole genome including non-coding regions

More complete results

Reduces issues with eg GC rich/repetitive regions

Can detect all variant types (some methods still in development)

Single nucleotide variants

Copy number variants

Structural variants including balanced rearrangements not detected by array (see Harripaul et al 2017)

Short tandem repeat detection is improving (though this is currently a limitation)

Mosaicism can be detected if coverage is good enough

Can sequence a cohort of affected individuals or use trios and apply filtering and statistical analysis to identify likely variants/genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is WES?

A

Sequencing all of the protein coding regions of the genomes (1-2% of the total genome)

Most common method used to date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the advantages of WES?

A

Less biased than a targeted approach

Cheaper than WGS (and quicker to analyse as less data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the disadvantages of WES?

A

No coverage of the non-coding regions

Coverage of repetitive regions, GC rich regions etc can be poor, therefore risking false negative results.

Not as good for detecting all types of variants (eg structural variation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the advantages of WGS?

A

Sequence the whole genome without bias, Including non-coding regions

Can detect all types of variant (though calling eg STRs is not currently done routinely and requires highly specialised variant calling strategies)

Includes balanced chromosomal rearrangements (e.g. Schluth-Bolard et al 2013)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the advantages of WGS?

A

More data generated, which can be costly to analyse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Approximately how many variant calls are generated through WES/WGS methods?

A

NGS methods, particularly WES/WGS generate large numbers of variant calls (4-5million variants per person, ~30,000 coding variants, Wright et al 2018).

The analysis strategy used in gene discovery will depend on the sample data set (eg trio vs proband only cohort) and the prior knowledge about the likely mode of inheritance, penetrance etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give examples of how variants can be filtered.

A
Quality metrics
MAF
Mutation type e.g. LoF
Genes involved in a particular biological pathway
Protein-protein interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For what expected inheritance patterns is it useful to sequence siblings?

A

Autosomal recessive – affected siblings are sequenced to identify shared variation and compound heterozygosity is expected in the absence of consanguinity or occurrence in an isolated population.

Consanguineous autosomal recessive –affected siblings are sequenced to identify shared homozygous variants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What strategy is useful for identifying a gene associated with X-linked recessive disease?

A

The favoured strategy is to analyse the two most remotely related male family members. Autosomal variants can be disregarded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What strategy is useful for identifying de novo dominant mutations?

A

Analysis of data from unaffected parents-affected child trios generally produces a handful of de novo variants for further analysis; comparison of these variants between as few as two families will generally reduce these to a single candidate gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can mosaic mutations be identified?

A

The comparison of sequence data from a patient’s affected and unaffected tissue is frequently sufficient to identify de novo mosaic disease-causing mutations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are statistical methods often not reliable for identifying rare causative variants from NGS data? What specialist statistical methods exist to help identify disease-causing genes?

A

CAST (Cohort Allelic sums test) compares the total extent of rare variation in a specific gene among patients and controls.

CMC (combined multivariate and collapsing method) or WST (weighted sums test): weighted to account for the fact that large genes have more chance to accumulate rare variation.

SKAT (sequence kernel association test), the C‑alpha test, and EREC (estimated regression coefficient test): tests each variant for association with disease independently, then combines the results across multiple DNA sequences to identify disease-associated genes. This strategy allows for the fact that some variants might be protective and cancel out risk variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the limitations of using a NGS-approach for gene discovery?

A

Interpretation of results is still challenging

Variants in non-coding regions difficult to interpret without substantial additional functional work- not always possible

Validating the role of novel genes. MacArthur et al 2014 proposed standards for assessing the likely pathogenicity of genes and variants (Table 1 from Macarthur et al):]

Cohort studies

Phenotypic data (HPO terms)

Incidental findings - careful consent required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What criteria should be met ahead of publishing a new gene-disease association?

A

only when variants in the same gene and similar clinical presentations have been confidently implicated in multiple unrelated individuals.

apply statistical methods to compare the distribution of variants in patients with large matched control cohorts or well calibrated null models, where possible

Functional studies if possible, to confirm gene role
- may be complex and expensive, access to patient samples may not be possible.

Large enough cohort

17
Q

What are the difficulties in achieving the criteria that should be met ahead of publishing a new gene-disease association? Are there any suggestions for how to overcome these?

A

Functional studies may be complex and expensive, access to patient samples may not be possible.

Most studies are performed in small families or groups of probands

Rare variants in single families difficult to corroborate; eg a de novo loss of function variant in a likely disease related gene in a single proband is still difficult to interpret without identification in a second family/case (Boycott et al 2013, Wright et al 2018)

Data sharing between studies is ESSENTIAL. Some tools already exist to support this such as Matchmaker Exchange (www.matchmakerexchange.org) which facilitates the matching of cases with similar phenotypic and genotypic profiles.

18
Q

Other than WGS and WES methods, what other NGS-based techniques may help identify novel disease-causing genes?

A

RNAseq
Epigenomics
ChiP-seq