Chapter 3 - Exome Sequencing Flashcards
Reader Ch.3
Limiting factors of traditional gene-discovery strategies (linkage mapping and cadidate gene resequencing)
-Availability of small number of cases
-Reduced penetrance
-Locus heterogeneity
-Substantially diminished reproductive fitness
-Responsible mutation may be de novo
Mendelian disorders
Inherited disorders like cystic fibrosis (kinkhoest), sickle cell anaemia
Coding variation analysis > massively parallel DNA sequencing >
Exome sequencing
Limitation of exome sequencing
it does not assess the impact of the non-coding alleles, but discovery of rare alleles underlying Mendelian phenotypes and complex traits
Why is exome sequencing effective for detecting rare alleles in Mendelian disorders?
Positional cloning studies are succesful for monogenic disorders
> most alleles underlying Mendelian disorders are protein coding
> large fraction of the rare protein altering variants are predicted to have functional consequences
> splice acceptor and donor sites are enriched for highly functional variation (targeted in exome sequencing)
How is the exome defined?
By the entire RefSeq and a large number of hypothetical proteins (this has limitations)
Limitations exome defining
-incomplete overview of protein-coding exons
-variety in efficiency of capture probes
-not all templates are sequences efficiently
-not all sequences can be uniquely aligned to the reference genome
Wet-lab workflow for exome sequencing
- Genomic DNA is sheared and used for in vitro shotgun library
- library fragments are flanked by adapters
- enrichment for sequences corresponding to exons > aqueous-phase hybridized capture
- recovery of hybridized fragments by biotin-streptavidin pulldown and washing
- amplification and massively parallel sequencing
- Mapping > calling of candidate causal variants
Bioinformatics steps in exome sequencing
- Probe design
- Quality control
- Map reads
- Determine variants
- Annotate variants
- Filter known variants
- exome comparison
- validation of candidate genes
Probe design
Designing probes for capturing exon fragments > unique and efficient probes
Quality control
High base quality and equal nucleotide frequencies across the sequence
Mapping the reads (bwa)
mapping against reference genome by algorithm
> unmapped reads are discarded, non-unique as well. Low confidence reads may cause problems
Determine variants (varscan)
Difference detection compared to reference genome: potential variant or sequencing error.
Criteria varscan
- At position of the variant at least N reads (default 8)
- From the N reads at least K reads with variant (default 2)
- Average base quality at position of the variant at least Q (default 15)
Annotate variants
Each variant is assigned various properties; gene name, region, nucleotide position, type of mutation, number of reads, quality etc.