Impact Of NGS Flashcards
What are the steps involved in NGS
DNA extraction Purification Fragmentation/shearing If WES - target baits Adapters Flow cell loading Bridge PCR Sequencing by synthesis
Why is NGS used over Sanger sequencing - and what is one reason to not use NGS
NGS is faster and cheaper
However it may give too much information therefore Sanger sequencing is used for simpler tests e.g. single gene test
What are the uses of NGS in the diagnostic lab
It can be used for diagnosis, management, treatment
It can inform clinical trials
It can be used to predict pathogenicity and inform life choices
It can be used for prenatal testing
What are the benefits of the 100,000 genomes project
It’s sequences rare disease and cancer genomes
This brings benefits of genetics to patients to aid diagnosis and treatment
It can facilitate new discoveries and medical invite for the sample molecular research and clinical trials
It facilitates personalised medicine
What is the genomic’s England panel app
It’s an app where professionals suggest gene panels and the wider community review them
What are Genomic England’s variant classification tiers
Tier 1 – nonpathogenic, protein truncating
Tier 2 – protein altering (missense), intronic (splice site)
Tier 3 – loss of function in genes not on the panel
Why use long read technology/What are the drawbacks of short read technology
Short read relies on PCR
PCR is not useful for sequences with high GC content and deletion and repeat regions can be poorly sequenced
Therefore there is a lot of DNA missing that long read technology has been able to identify
Describe PACBIO SMRT library construction
Fragmentation – selection of large fragments around 30 to 80 kbp
End repair and adapted ligation
Adapters are circular hairpins at the end, containing primer binding sites at which DNA polymerase attaches
Describe PACBIO SMRT Zero Mode Waveguide sequencing
Zero mode waveguide guides light into small faces so it dissipates out
Each DNA base has a fluorescent tag
As bases are incorporated the tag is cleaved and fluorescence diffuses out
The light is detected resulting in base call
What are some of the applications of PACBIO
It helps fill in the missing regions of DNA that short we could not identify e.g. high GC region and repeat regions
It can detect structural variance such as copy number variation, repeat, inversions
It can identify mobile genetic elements and identify the alleles across a chromosome
Describe what a repeat expansion disease is and give some examples
These occur in UTRs, coding exons and introns
Examples include
Spinocerebellar ataxia – hereditary, progressive, degenerative, fatal
Huntington’s disease – CAG repeat in Huntington gene (HTT)
Normal = <27, intermediate = 27 to 35, pathogenic = > 35
Expanded protein is toxic and accumulates in neurons causing cell death
How is Huntington’s disease identified
Traditionally via PCR and electrophoresis
However this shows size but not the sequence info or structure
PACBIO
Can find interruptions in the sequence for example CAG > CAA
Can measure size and identify new somatic repeat expansions (somatic instability)
Describe how PACBIO is used in RNA sequencing
It can sequence the entire RNA at once
Different either form of mRNA can be sequence showing alternative splice sites
In contrast NGS only shows a reconstruction of one long consensus of all isoforms
What is dark genome and what are the camouflaged regions
The dark genome can be
By depth - poor sequence depth
By quality - poor sequence quality
Camouflage regions are usually due to kind of repeat region or difficult to recover variants
What is Oxford nanopore sequencing
This is ultra long lead up to 2MB
DNA passes to a nanpore and the base sequence is detected as an electrical signal
Each base has its own electrical pattern
What are the advantages and disadvantages of Oxford nanopore sequencing
Advantages – it is one single machine, no other expensive machines are needed and it is scalable by using multiple flow cells
Disadvantages – it is expensive and has a high error rate
Describe GBA mutations
The GBA gene codes for the enzyme glucocerebrosidase
Biallelic mutations can cause Gaucher’s disease (rare)
This causes increased the deposition of glucocerebroside in macrophages because of enzyme deficiency
Minor allelic mutations are a risk factor for Parkinson’s disease
It is difficult to study as it has a pseudogene (GBAP1) and duplicated region therefore there are many repetition which is difficult to study via NGS
What does the genetic association mean
This is when a variant allele appears at a higher frequency in unrelated subject with a disease (case) versus controls
Describe the GWAS study design
It consists of equal, match case-controls via SNP micro array
Statistical tests include chi squared and PLINK
Quality control must be done on SNP and sample data, batch effect and population structure
Describe SNP quality control
Remove SNP not typed in many subject (unreliable genotype)
Remove SNP with low minor allele frequencies (Where SNPs have no detection power while errors have high effect)
Exclude deviations from Hardy-Weinberg
Describe NGS sample data quality control
Remove samples with high levels of missing genotype
Remove samples with very high or very low heterozygosity (unlikely genotypically)
Check that reported sex is the same by checking X heterozygosity and comparing to data
(If 10 males, then 10 should be heterozygous)
What is meant by population structure
It is the consideration that an increase of one ethnicity means that any SNP common in the population may become associated with the disease
This affects Q-Q plot
What is a Q-Q plot
This is a plot of the observed versus expected values
Most values follow the XY line, with deviations as p-value is lower
If this lifts early that indicates population structure problem = genomic inflation
This inflation can be calculated and accounted for , e.g. by multidimensional scaling
What are batch effects
Variations in processing which can induce variation in genotype = false associations
This causes fake associations especially when cases and controls are in separate batches
What is a Manhattan Plot and Regional Plot
X axis - chromosome location
Y - -log10(p-value)
Regional plot zooms into region of interest showing gene labels, individual SNPs and the p-value
What are the advantages of GWAS
Strong associations are found and easily replicated
It has led to identifying new genes
It has impacted clinical treatment for example Metformin and Type II diabetes
What are the disadvantages of GWAS
It describes associations and not the cause
Most associated SNP have no known effect, are in genes of unknown function or NOT in genes
There is little clinical relevance
What is Mendelian Randomisation
It is a test because the effects in studies with confounding factors
It uses genetic variation with no affect on traits or exposes related to the disease for example using a genetic variant which has the same effect as a drug would
E.g. a variant causing lower LDL rather than using a statin
Give an example of how to investigate if low LDL-C levels cause less cardiovascular events
Instead of using statins you can find a genotype that reduces LDL
The control group would be for the homozygous normal variant
While the group studies is the homozygous beneficial variant
This shows lower LDL reduces CV events rather than the drug
What are the assumptions of Mendelian randomisation
It is completely random if someone is bb or BB (unrelated to mate choice)
That you didn’t pick up a bio marker that causes the disease directly
That there is no population stratification
GWAS has provided suitable SNPs to select for
Give an example of how to investigate cannabis and schizophrenia with Mendelian randomisation
If cannabis is the exposure causing schizophrenia
You find a genotype that increases cannabis use Vs the other allele causing no change
This study found odds ratio of 1.37
What is a polygenic risk score and how is it calculated
Adding up the risk alleles present and combining the score
Allele load = 0,1,2 and adjusted for effect size
What are the limitations of polygenic risk score
Is it sensitive to ethnicity
The effect size is affected by the environment
It is limited by the power of association of the SNP
It assumes that they are additive
Does not take into account other variants that may directly cause the disease
How can polygenic risk scores be used in clinical trials and how is a threshold determined
You split the group into clear case control groups
If the cases have a higher polygenic risk score that doesn’t overlap with controls this is a good score to use as a threshold
If the controls have a very low polygenic risk score and this does not overlap with cases this can be a threshold where you can be more sure that there is no concern
When should you and shouldn’t you use Polygenic risk scores
You can use it to identify case control in large groups
You can use it to see if certain biomarkers are increased in higher risk groups
You cannot use it to identify cases in individuals
It is not useful when the disease has other important variance
You cannot substitute for family history risk