Duncan - variant nomenclature and analysis Flashcards

1
Q

how much variation do we see in the average human genome?

A

compared to a reference human genome, a person’s ~6 billion-nucleotide genome sequence will have:

5,000,000 Single Nucleotide Variants (SNPs) that involve ~5,000,000 nucleotides

600,000 insertion/deletion variants (2+ nucleotides) that involve ~2,000,000 nucleotides

25,000 structural variants (such as CNVs) that involve >20,000,000 nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the basic structure of a gene?

A

Start codon - ATG for amino acid methionine, initiates the reading frame/transcription

Exons - codes for the protein, contributes to the final mRNA molecule that determines order of amino acids

Introns - non-coding, don’t contribute to final mRNA molecule, removed by splicing

Stop codon - several options, e,g, UGA, UAG, UAA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

one possible type of variant is a nonsense variant -
what is this?
how does the cell deal with it/the resultant mRNA?

is it likely to be disease causing?

A

Alter the amino acid code, resulting in a stop codon, so the protein ends prematurely.
These mRNAs are then targeted by NMD - nonsense mediated decay - a system that prevents production of faulty proteins (quality control, protects cell from ‘aberrantly’ functioning proteins). Likelihood of NMD should be considered when assessing variants that result in shortened proteins

mRNA that escapes NMD may produce proteins that retain some functionality – potentially not causing disease. however if mRNA doesn’t escape NMD, you’re essentially losing that protein/its not being expressed, likely disease causing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how can you, generally, identify whether or not a nonsense mutation will result in mRNA that manages to escape NMD?

A

General rules to identify those aberrant mRNAs that may escape NMD:
if the DNA variant is present in last exon
if the DNA variant is located in last 50 nucleotides of the penultimate exon
(then the mRNA may escape NMD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

name 6 types of variants, and the
consequences that can have

A

Stop and start variants -
Occur in the stop or start codons
In start codon: transcription not initiated, no protein product, probably disease causing
In stop codon: transcription continues into the non-coding DNA 3’ of the gene, resulting in a protein with additional amino acids that are likely to interfere with structure + function and cause disease

Missense variants -
Most common, it’s just a substitution, an amino acid is swapped AND changes the amino acid coded for
May or may not be pathogenic

Nonsense variants -
Alter the amino acid code, resulting in a stop codon, so the protein ends prematurely.
These mRNAs are then targeted by NMD - nonsense mediated decay (more later)

Deletion variants -
Result in a frameshift and therefore almost always disease-causing
Can be just 1 nucleotide or an entire gene (entire gene = CNV)
Codes for different amino acids than WT, you’ve got a novel protein with new or lost functions…
But a reading frame shift somehow gives a new stop codon within the first 200 codons, so you get a truncated protein that may be targeted by NMD

Duplications -
Addition of nucleotides = frame shift = altered amino acid sequence = likely to be disease causing
Also often gives a premature stop codon and truncated protein. Same as deletions in terms of consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RNA splicing - what are donor and acceptor sites?

what are donor and acceptor variants like/are they likely to cause disease?

A

Donor = the exonic and intronic sequences flanking the 5’ end of an intron, typically GT
Acceptor = the exonic and intronic sequences flanking the 3’ end of an intron, typically AG

Donor splice site variants -
Change in donor splice site = not recognised by splice machinery so removal of the intronic DNA not initiated, it gets included in the mRNA, altering protein function and structure
Also causes a frameshift so you get an early stop codon and a truncated protein
Often disease causing

Acceptor splice site variants -
Results in exclusion of exon. Donor site is recognised and removal of intron initiated but acceptor site is never reached, so the exon is removed too
Very likely to be disease causing as the exon may encode vital parts of the protein (active sites/binding sites etc…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how does the spliceosome work?

A

objective = removal of introns from pre mRNA

small nuclear RNA (snRNA) molecules bind to specific proteins, forming a sn-ribonucleoprotein complex (snRNP)
this combines with other snRNPs forming the spliceosome. snRNPs recognise and bind to the acceptor and donor sites, the intron is looped out and excised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

are donor/acceptor splice site variants likely to be pathogenic?

A

Donor/acceptor sites = 15% of recorded pathogenic variants, as can lead to aberrant splicing, excluding exons or including intronic sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when naming a variant, what are the three components involved (and where does the second one come from)?

A
  1. gene name
  2. reference sequence (represents normal WT)
  3. Variant description

a human genome reference sequence is used as a WT reference, while the HG was 1sr sequenced in 2003, it had gaps and erros, is constantly updated, latest version is GRCh38 ‘genome reference consortium human build 38’

there are multiple versions of the human genome sequence.
These are combined to form consensus sequences and are updated as more data is gathered, so different reference will differ slightly, which is why you must include it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why must you know which gene reference sequence you are using?

A

The DNA sequence of genes is predicted from human genome sequences and sometimes confirmed via assay.

But as sequence data and knowledge increases, the gene DNA sequences are regularly updated.

These can include new exons, longer introns, additional nucleotides etc
Therefore for accurate variant you must know which gene reference sequence you are using

We normally use references supported by sequencing of the corresponding mRNA transcripts as this provides good knowledge of intron exon boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

in terms of the reference used in variant naming, what are the three options?

A

NM_xxx = based on mRNA transcripts, includes introns

NG_xxx = genomic sequence of a gene

NP_xxx = protein sequence based on NM_xxx sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how is a gene’s reference sequence annotated (as in each base is given an annotation to tell you where it is in the sequence, how does this system work)?

A

the c. sequence

each nucleotide has a c number, with C.1 being the A of the ATG start codon

nucleotides in the exons are then just numbered in order (C.1, C.2, C.3 etc…)

nucleotides in introns are numbered based on how far they are from the nearest coding nucleotide, so if the first exon in a gene ends at C.5, the first nucleotide of the intron is c.5 +1
the last intronic nucleotide would be c.6 -1

youd include the base too, so c.5A or c.5+3T etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how are amino acids labelled?

A

Named by position, start codon being 1, then followed by single letter or three letter code e.g. p.A23 or p.Ala23 = an alanine at position 23

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what would c.11G>A mean?

say this change makes the OG glycine become an aspartic acid, how would you write this at the protein level?

A

Nucleotide 11 (coding nucleotide) was a G and has been changed to an A (>means change/substitution)

at protein level:
11th nucleotide is in the fourth codon, so fourth amino acid = p.4
p.Gly4Asp same as above you just don’t use the arrow to indicate change (>)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

in ‘p.(Gly4Asp) what do the brackets indicate?

A

Brackets indicate this proteins change is a prediction based on the sequence and has not been experimentally confirmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

for a gene we have two alleles.

how would I indicate:
1. two WT alleles in terms of DNA sequence?

  1. two WT alleles in terms of protein/Aa sequence?
  2. one WT one variant?
A
  1. c[=];[=]
    the ‘=’ is for WT, the ‘[ ]’ show its the allele youre talking about, the ‘;’ separates them
  2. p.[(=)];[(=)]
    the extra bracket for the protein = only DNA sequencing has been done
  3. If there is a variant, just shove the code in the brackets in place of the =. If its on both allele, replace both ‘=’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if you’ve got two variants in the same gene, how do you indicate which allele each variant is on?

A

Can’t tell from your initial DNA sequencing, may need to look at the parents. Three situations, the two variants are on the same allele, different, or you dont know.

Same: c.[20G>C;59T>A];[=] shove it in the same square brackets cus its the same allele, just separate the variants by a ‘;’ THIS IS KNOWN AS CIS

Different: c.[20G>C];[59T>A] easy, put each variant in different square brackets for different alleles

Unknown -
c.[20G>C(;)59T>A] put it all in the same square bracket, separating the two variants by (;) remeber cus () these brackets kind of indicate ‘unconfirmed’

same rules apply to protein, dont forget if ony DNA sequencing has been done to use [( )]

18
Q

how do you describe a deletion? e.g. of the 13th and 14th coding nucleotides

A

This is described as c.13_14.
That it is a deletion of these nucleotides is described by the abbreviation del.
So it is described as c.13_14del
Alleles are again separated using square brackets and an equals symbol for wild type.
If only one nucleotide is deleted then of courts only that nucleotide is list,
For example, c.14del

c.[13_14del];[=]

19
Q

if there has been a deletion, how do I write this at the protein level?

A

You need to indicate there has been a frameshift, so you simply put the amino acid from which the sequence changes (you don’t need to put what it changes to) followed by ‘fs’ for frameshift

E.g if the deletion left the first four codons normal, and the first change in amino acid was in amino acid 5, originally a valine, you’d put:
p.[(Val5fs)];[(=)]

20
Q

for a nonsense variant, i.e. you’ve got a premature stop codon, how do you write this at DNA level vs protein level?

A

indicated in the DNA sequence as you would a regular substitution/single nucleotide change, c.[21T>A];[=]

At the protein level, you indicate which amino acid has changed to give a premature stop codon with a *

p.[(Gly7*)];[(=)]
Glycine at position 7 was replaced with a stop codon

21
Q

what does ‘p.[(Arg12*)];[(=)] tell you?

A

arginine at position 12 has been replaced by a stop codon, the other allele is presumed WT

22
Q

are changes within splice sites and intronic variants treated the same?

what about if youre writing at the protein level

A

yes they are

just use the correct term to refer to the nucleotide position (as in with the + or -)

E.G 1
G to C change at 45 nucleotides away in 3’ direction from the last coding nucleotide c.548
c.[548+45G>C];[=]

at the protein level -
protein changes cannot be certain and so the nomenclature uses a ?
p.[(?)];[(=)]

23
Q

define pathogenic, benign and VUS

A

Pathogenic - it has a negative effect on the production of the protein and so causes disease

Benign - the protein produced pretty much functions normally, does not cause a problem. Same as a polymorphism. These can be very common, seen in 50% of the population or as low a 0.0001% of the population, some benign polymorphisms may be confined to a single family

VUS - variant of uncertain significance, basically not sure if its a problem

24
Q

when classifying a variant, the Association for Clinical Genomic Science (ACGS) guidelines are followed.

briefly describe how this system works?

A

Evidence on a variant is gathered and compared to the guidelines, using a point based system. Based on how many benign or pathogenic criteria are met by the variant, it is classified from class 1 to class 5

Each criteria is given a code, beginning with P or B for whether meeting that criteria brings a variant closer to being classified as pathogenic or benign

25
Q

what are the five classifications that a variant can fall into?

A

class 1 - benign
class 2 - likely benign
class 3 - VUS
class 4 - likely pathogenic
class 5 - pathogenic

26
Q

what makes something class 1 or 2?

A

Class 1 - benign
Seen in high frequencies in the population (normally)
Some experimental evidence showing the variant does not have an ‘adverse’ effect on the protein. Not linked to patient phenotype

Class 2 - likely benign
This means they are benign, but the evidence is less strong than for class one variants. So seen in normal population, but not as frequently as a class one variant, often not located in a functional domain of the protein

1 and 2 not reported to patient - not related to disease

27
Q

what is the problem with a class 3 result and what is done with it?

A

Class 3 - unknown significance

Evidence for P or B is inconclusive - variants of unknown significance
Awkward to report to patients - doesn’t give diagnosis or confirm that they don’t have a pathogenic mutation
Often further work needed, such as testing more genes, or checking whether affected and unaffected family members also have the variant

Those closer to class 4 are reported, those closer to class 2 are not. I.e. you report these when you think further evidence will confirm the variants are pathogenic. If not you’re just causing anxiety for the patient

28
Q

if you get a class 1/2/3 result, can you exclude genetic disorders?

A

you cannot exclude genetic disorders. Just because you haven’t yet found a pathogenic variant. You can only confirm genetic disorders

29
Q

what makes a variant class 4 or 5?

what is done with them?

A

Class 4 - likely pathogenic
No/low population frequency
Previous pathogenic variants in the region may have been identified before

Class 5 - pathogenic
Segregates with the disease in family (seen in a family only in members with the disease)

Functional study/experiments show the protein isn’t working properly/confirms adverse effects

All evidence in agreement
Truncating change - is often class 5, due to nonsense variants or frameshift-causing ones, you lose a chunk of the protein and its likely to cause a problem…

4 and 5 reported to patient and clinicians as confirms a genetic diagnosis of disease and may help with treatment/allow for family testing/inform on risks to offspring

30
Q

there (in my notes) are 6 pieces of evidence required to classify a variant. what are they?

A

frequency/how common the variant is in normal population

How the protein structure/function is affected
Does it result in LOF/altered properties, or alter splicing?

Has it been identified in other patients with the same disease

Has it been inherited, or newly arisen (de novo), and if inherited, do other family members with the variant also have the disease?

Does it make sense - i.e., is the function of the gene/protein in question linked to the patient’s clinical symptoms?

Is the variant in a mutational hotspot/vital functional protein domain (regions where any changes often result in disease because the correct sequence is so important)?

31
Q

before variant analysis, what can be some major clues as to where a variant will fall?

A

The type of variant can be a major clue -

Nonsense / frameshift-causing variants = often pathogenic as result in a truncated protein that is severely affected

Synonymous/silent - almost always benign obviously

Missense - depends on how drastic the amino acid change is. Was it a swap to another amino acid with similar properties? Or is it different in size/charge/hydrophobic and likely to affect DNA binding/protein folding + shape etc…

32
Q

how would you check the frequency of a variant in the population?

what classifications would you apply depending on the result?

A

You look in databases like Gnomad, with loads of variants collected from loads of whole genome sequencing projects that only involved people free of severe disease (as far as could be told), and see how common your variant in question is in the healthy population

If the variant is found in healthy population - you CANNOT apply ‘PM2’, and it’s unlikely/less likely to be pathogenic
If the variant is not found in healthy population - you CAN apply ‘PM2’

BA1 = when the variant is in >5% of general population (highly unlikely to cause disease you don’t see more than 5% of people with a certain rare genetic disorder) therefore is benign, typically filtered out in the first place when sequencing results) stand alone, if BA1 is applicable its enough to classify a variant as benign
Caveat - can see high frequencies of disease causing variants in recessive genes in heterozygotes, e.g. CFTR pathogenic variants seen in 1 in 25

BS1 - similar, but less ‘strong’. Applied to variants when the allele frequency is greater than expected for that of the disorder
Can only be used when you’ve got knowledge of the disease occurrence /frequency (on top of frequency of the variant in normal population). Yoy can apply BS1 if the frequency of the variant is higher than the frequency of the disease

33
Q

when can PM2 be applied to a recessive gene?

A

Can be applied to recessive genes if - there are no homozygotes of the variant in Gnomad/the normal control populations and frequency of the variant is less than that of the disease incidence

34
Q

what kind of computational evidence can be gathered when classifying variants?

A

an algorithm uses information to determine whether the amino acid change/s seen in a variant will affect the gene product/protein

  1. Look at if there are changes to evolutionarily conserved sites - as in if the amino acid that’s been altered is one that’s the same across a load of species, it’s probably very important for structure/function that it stays as that amino acid. It’s not as much of a concern if its at a site that’s more flexible where species have some different amino acids there
  2. Also takes into account what the amino acid changed to and from - were they of similar size/charge/hydrophobicity etc… because the more they differ in properties, the more likely that variant is to alter protein shape/structure/function

gives you a revel score: 0 to 0.39 = Benign
0.4 to 0.69 = uncertain
0.7 to 1 = pathogenic

35
Q

what classifications can be applied to what revel scores?

A

0 to 0.39 = Benign
0.4 to 0.69 = uncertain
0.7 to 1 = pathogenic
PP3 may only be applied with a pathogenic score
BP4 maybe be applied with a Benign score

36
Q

if you have a null variant, that results in loss of function, firstly what kind of variants does this include and what classification?

what process must you consider?

A

truncated proteins, so nonsense variants or variants resulting in a frameshift.
apply PVS (pathogenic very strong)

NMD Some variants are likely to escape the process and survive if:
DNA variant is present in last exon
DNA variant in located in last 50 nucleotides of the penultimate exon

If it escapes the truncated version might function enough to not cause disease or a milder disease at least
if don’t escape = massively reduced CN = disease causing

37
Q

when a variant alters splicing what impact does this have on classification?

A

if donor or acceptor sites are altered, this is computational evidence supportying a deleterious effect of a gene (PP3)

AND - especially in donor site, inclusion of intron etc… - can get frameshift, early stop codon, truncated protein, NMD so apply PVS

benign counterparts -
BP4 (computational evidence suggests no impact on gene/ gene product
And/or
BP7 yadyadyada

38
Q

a variant is seen in other patients with same symptoms

A

PS4

39
Q

variant is de novo (this answer involves requirements that could cause family arguments)?

A

If you test mum and dad and it looks like they don’t have it (so it is de novo) but you haven’t confirmed they are indeed the mum and dad (well really that dad is dad) you apply PM6

If testing mum and dad tells you it’s de novo, plus no family history of the disease, and you have confirmed paternity (and maternity) by e.g. QF-PCR, you apply PS2

40
Q

what is meant by family segregation and how is this used in classifying a variant?

A

Litch so simple, you test the family for the variant and ask the question “is the disease always and only seen in members with the variant?” if yes, its very likely it causes the disease, that to have the disease one must inherit the variant

You can apply PP1 if it is the case, and BS4 if its not (cus if family members without the disease do have the variant its a pretty strong indicator that the variant does not cause the disease)

41
Q

how can the patient’s symptoms be useful in variant classification/what question should be asked?

A

Does the patient’s symptoms align with the affected gene and associated disease?
E.g. A patient referred with learning disability but the gene we found a variant in is only associated with short limbs
Does not support pathogenicity

E.g. A patient referred with learning disability and the gene with the variant is commonly reported in patients with learning disability
Supports pathogenicity, apply PP4

42
Q

what is meant by a mutational hotspot and how is this concept used in variant classification?

A

Basically is there reason to believe a mutation in that specific location is likely to cause disease - e.g. a region that is so vital to function, variants there are often pathogenic
Can use disease databases to find where pathogenic variants are in a disease and identify hotspots - often super important domains of a protein, e.g. DNA binding sites
Apply PM1