1. At position of the variant at least N reads (default 8) 2. From the N reads at least K reads with variant (default 2) 3. Average base quality at position of the variant at least Q (default 15)

Chapter 3 - Exome Sequencing Flashcards by Tobias H

Limiting factors of traditional gene-discovery strategies (linkage mapping and cadidate gene resequencing)

-Availability of small number of cases
-Reduced penetrance
-Locus heterogeneity
-Substantially diminished reproductive fitness
-Responsible mutation may be de novo

How well did you know this?

Not at all

Perfectly

Mendelian disorders

Inherited disorders like cystic fibrosis (kinkhoest), sickle cell anaemia

How well did you know this?

Not at all

Perfectly

Coding variation analysis > massively parallel DNA sequencing >

Exome sequencing

How well did you know this?

Not at all

Perfectly

Limitation of exome sequencing

it does not assess the impact of the non-coding alleles, but discovery of rare alleles underlying Mendelian phenotypes and complex traits

How well did you know this?

Not at all

Perfectly

Why is exome sequencing effective for detecting rare alleles in Mendelian disorders?

Positional cloning studies are succesful for monogenic disorders
> most alleles underlying Mendelian disorders are protein coding
> large fraction of the rare protein altering variants are predicted to have functional consequences
> splice acceptor and donor sites are enriched for highly functional variation (targeted in exome sequencing)

How well did you know this?

Not at all

Perfectly

How is the exome defined?

By the entire RefSeq and a large number of hypothetical proteins (this has limitations)

How well did you know this?

Not at all

Perfectly

Limitations exome defining

-incomplete overview of protein-coding exons
-variety in efficiency of capture probes
-not all templates are sequences efficiently
-not all sequences can be uniquely aligned to the reference genome

How well did you know this?

Not at all

Perfectly

Wet-lab workflow for exome sequencing

Genomic DNA is sheared and used for in vitro shotgun library
library fragments are flanked by adapters
enrichment for sequences corresponding to exons > aqueous-phase hybridized capture
recovery of hybridized fragments by biotin-streptavidin pulldown and washing
amplification and massively parallel sequencing
Mapping > calling of candidate causal variants

How well did you know this?

Not at all

Perfectly

Bioinformatics steps in exome sequencing

Probe design
Quality control
Map reads
Determine variants
Annotate variants
Filter known variants
exome comparison
validation of candidate genes

How well did you know this?

Not at all

Perfectly

Probe design

Designing probes for capturing exon fragments > unique and efficient probes

How well did you know this?

Not at all

Perfectly

Quality control

High base quality and equal nucleotide frequencies across the sequence

How well did you know this?

Not at all

Perfectly

Mapping the reads (bwa)

mapping against reference genome by algorithm
> unmapped reads are discarded, non-unique as well. Low confidence reads may cause problems

How well did you know this?

Not at all

Perfectly

Determine variants (varscan)

Difference detection compared to reference genome: potential variant or sequencing error.

How well did you know this?

Not at all

Perfectly

Criteria varscan

At position of the variant at least N reads (default 8)
From the N reads at least K reads with variant (default 2)
Average base quality at position of the variant at least Q (default 15)

How well did you know this?

Not at all

Perfectly

Annotate variants

Each variant is assigned various properties; gene name, region, nucleotide position, type of mutation, number of reads, quality etc.

How well did you know this?

Not at all

Perfectly

Filter the known variants

Study These Flashcards

Remove synonymous variants and variants which are present in public SNP databases or an in-house reference database because they are unlikely to cause the disorder

Exome comparison

Study These Flashcards

Between different patients to find one or more affected genes in each of the patients (same variant is not required)

Validation of cadidate genes

Study These Flashcards

Wet-lab validation with Sanger sequencingfor example or comparison with sets of exomes and genomes

Depending factors for stategy of indentifying causal alleles > impact sample size for adequate power in bioinformatics

Study These Flashcards

-mode of inheritance (exome sequencing is more efficient for recessive disorders > less genes with two novel protein altering alleles)
-pedigree or population structure
-phenotype arising de novo or inherited > screening family
-extent of locus heterogeneity for a trait

Filering data steps

Study These Flashcards

discrete filtering: by comparing variants among individuals and against public databases/controls
Stratification of variants

Novelty of allele assessment

Study These Flashcards

-Set of public database polymorphisms like dbSNP and 1000 genomes project
> from unaffected individuals

Filtering

Study These Flashcards

Eliminating candidate genes by assuming any allele found in the filter set cannot be causative

Assumption for filtering from dbSNP, and problems with the assumption

Study These Flashcards

Controls do not have any alleles in the set from the individuals with the diseased phenotype
-Problems
>dbSNP is contaminated with a small number of pathogenic alleles
>some pathogenic alleles have a higher minor frequency: pathogenic gene variant also occurs in control exomes > risk of eliminating truly pathogenic alleles

Stratification candidates (name the groups)

Study These Flashcards

-by mutation type > predicted impact/deleteriousness
-by segmental duplications > variants found in segmental duplications are discarded
-by pseudognes: dysfuncitonal relatives of genes that have lost their protein-coding ability
-by function: predicted role of the protein product
-by functional impact: for non-synonymous alleles > impact on phenotype prediction

(technical) Failure reasons of exome sequencing

-part of all the causative genes is not in the target definition -inadequate coverage of the region which contains the causal variant -the causal variant is covered but not accurately called -false variants in a gene are called because of mismapped reads or alignment errors

Failure with discrete filtering

reducing power due to genetic heterogeneity or false-positive calls (processed pseudogenes or segmental duplications)

Chapter 3 - Exome Sequencing Flashcards

Reader Ch.3 (26 cards)