Impact Of NGS Flashcards

1
Q

What are the steps involved in NGS

A
DNA extraction
Purification
Fragmentation/shearing 
If WES - target baits 
Adapters 
Flow cell loading
Bridge PCR
Sequencing by synthesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is NGS used over Sanger sequencing - and what is one reason to not use NGS

A

NGS is faster and cheaper

However it may give too much information therefore Sanger sequencing is used for simpler tests e.g. single gene test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the uses of NGS in the diagnostic lab

A

It can be used for diagnosis, management, treatment

It can inform clinical trials

It can be used to predict pathogenicity and inform life choices

It can be used for prenatal testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the benefits of the 100,000 genomes project

A

It’s sequences rare disease and cancer genomes

This brings benefits of genetics to patients to aid diagnosis and treatment

It can facilitate new discoveries and medical invite for the sample molecular research and clinical trials

It facilitates personalised medicine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the genomic’s England panel app

A

It’s an app where professionals suggest gene panels and the wider community review them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Genomic England’s variant classification tiers

A

Tier 1 – nonpathogenic, protein truncating
Tier 2 – protein altering (missense), intronic (splice site)
Tier 3 – loss of function in genes not on the panel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why use long read technology/What are the drawbacks of short read technology

A

Short read relies on PCR
PCR is not useful for sequences with high GC content and deletion and repeat regions can be poorly sequenced

Therefore there is a lot of DNA missing that long read technology has been able to identify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe PACBIO SMRT library construction

A

Fragmentation – selection of large fragments around 30 to 80 kbp

End repair and adapted ligation
Adapters are circular hairpins at the end, containing primer binding sites at which DNA polymerase attaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe PACBIO SMRT Zero Mode Waveguide sequencing

A

Zero mode waveguide guides light into small faces so it dissipates out

Each DNA base has a fluorescent tag

As bases are incorporated the tag is cleaved and fluorescence diffuses out
The light is detected resulting in base call

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some of the applications of PACBIO

A

It helps fill in the missing regions of DNA that short we could not identify e.g. high GC region and repeat regions

It can detect structural variance such as copy number variation, repeat, inversions

It can identify mobile genetic elements and identify the alleles across a chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe what a repeat expansion disease is and give some examples

A

These occur in UTRs, coding exons and introns

Examples include
Spinocerebellar ataxia – hereditary, progressive, degenerative, fatal

Huntington’s disease – CAG repeat in Huntington gene (HTT)
Normal = <27, intermediate = 27 to 35, pathogenic = > 35
Expanded protein is toxic and accumulates in neurons causing cell death

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is Huntington’s disease identified

A

Traditionally via PCR and electrophoresis
However this shows size but not the sequence info or structure

PACBIO
Can find interruptions in the sequence for example CAG > CAA
Can measure size and identify new somatic repeat expansions (somatic instability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe how PACBIO is used in RNA sequencing

A

It can sequence the entire RNA at once
Different either form of mRNA can be sequence showing alternative splice sites

In contrast NGS only shows a reconstruction of one long consensus of all isoforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is dark genome and what are the camouflaged regions

A

The dark genome can be
By depth - poor sequence depth
By quality - poor sequence quality

Camouflage regions are usually due to kind of repeat region or difficult to recover variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Oxford nanopore sequencing

A

This is ultra long lead up to 2MB
DNA passes to a nanpore and the base sequence is detected as an electrical signal
Each base has its own electrical pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages and disadvantages of Oxford nanopore sequencing

A

Advantages – it is one single machine, no other expensive machines are needed and it is scalable by using multiple flow cells

Disadvantages – it is expensive and has a high error rate

17
Q

Describe GBA mutations

A

The GBA gene codes for the enzyme glucocerebrosidase

Biallelic mutations can cause Gaucher’s disease (rare)

This causes increased the deposition of glucocerebroside in macrophages because of enzyme deficiency

Minor allelic mutations are a risk factor for Parkinson’s disease

It is difficult to study as it has a pseudogene (GBAP1) and duplicated region therefore there are many repetition which is difficult to study via NGS

18
Q

What does the genetic association mean

A

This is when a variant allele appears at a higher frequency in unrelated subject with a disease (case) versus controls

19
Q

Describe the GWAS study design

A

It consists of equal, match case-controls via SNP micro array

Statistical tests include chi squared and PLINK

Quality control must be done on SNP and sample data, batch effect and population structure

20
Q

Describe SNP quality control

A

Remove SNP not typed in many subject (unreliable genotype)

Remove SNP with low minor allele frequencies (Where SNPs have no detection power while errors have high effect)

Exclude deviations from Hardy-Weinberg

21
Q

Describe NGS sample data quality control

A

Remove samples with high levels of missing genotype

Remove samples with very high or very low heterozygosity (unlikely genotypically)

Check that reported sex is the same by checking X heterozygosity and comparing to data
(If 10 males, then 10 should be heterozygous)

22
Q

What is meant by population structure

A

It is the consideration that an increase of one ethnicity means that any SNP common in the population may become associated with the disease

This affects Q-Q plot

23
Q

What is a Q-Q plot

A

This is a plot of the observed versus expected values
Most values follow the XY line, with deviations as p-value is lower

If this lifts early that indicates population structure problem = genomic inflation

This inflation can be calculated and accounted for , e.g. by multidimensional scaling

24
Q

What are batch effects

A

Variations in processing which can induce variation in genotype = false associations

This causes fake associations especially when cases and controls are in separate batches

25
Q

What is a Manhattan Plot and Regional Plot

A

X axis - chromosome location
Y - -log10(p-value)

Regional plot zooms into region of interest showing gene labels, individual SNPs and the p-value

26
Q

What are the advantages of GWAS

A

Strong associations are found and easily replicated
It has led to identifying new genes
It has impacted clinical treatment for example Metformin and Type II diabetes

27
Q

What are the disadvantages of GWAS

A

It describes associations and not the cause
Most associated SNP have no known effect, are in genes of unknown function or NOT in genes
There is little clinical relevance

28
Q

What is Mendelian Randomisation

A

It is a test because the effects in studies with confounding factors

It uses genetic variation with no affect on traits or exposes related to the disease for example using a genetic variant which has the same effect as a drug would
E.g. a variant causing lower LDL rather than using a statin

29
Q

Give an example of how to investigate if low LDL-C levels cause less cardiovascular events

A

Instead of using statins you can find a genotype that reduces LDL

The control group would be for the homozygous normal variant
While the group studies is the homozygous beneficial variant

This shows lower LDL reduces CV events rather than the drug

30
Q

What are the assumptions of Mendelian randomisation

A

It is completely random if someone is bb or BB (unrelated to mate choice)

That you didn’t pick up a bio marker that causes the disease directly

That there is no population stratification

GWAS has provided suitable SNPs to select for

31
Q

Give an example of how to investigate cannabis and schizophrenia with Mendelian randomisation

A

If cannabis is the exposure causing schizophrenia

You find a genotype that increases cannabis use Vs the other allele causing no change

This study found odds ratio of 1.37

32
Q

What is a polygenic risk score and how is it calculated

A

Adding up the risk alleles present and combining the score

Allele load = 0,1,2 and adjusted for effect size

33
Q

What are the limitations of polygenic risk score

A

Is it sensitive to ethnicity
The effect size is affected by the environment
It is limited by the power of association of the SNP
It assumes that they are additive
Does not take into account other variants that may directly cause the disease

34
Q

How can polygenic risk scores be used in clinical trials and how is a threshold determined

A

You split the group into clear case control groups

If the cases have a higher polygenic risk score that doesn’t overlap with controls this is a good score to use as a threshold

If the controls have a very low polygenic risk score and this does not overlap with cases this can be a threshold where you can be more sure that there is no concern

35
Q

When should you and shouldn’t you use Polygenic risk scores

A

You can use it to identify case control in large groups
You can use it to see if certain biomarkers are increased in higher risk groups

You cannot use it to identify cases in individuals
It is not useful when the disease has other important variance
You cannot substitute for family history risk