Genome sequencing + DNA sequence variants Flashcards
How are types of DNA variants described with respect to?
mRNA sequence
Proteins amino acid sequence
Reading frame
Exon and intron boundaries.
Whats exon
Regions that code for protein
Will be part of the final mature RNA produced
Whats Intron
Non-coding
Removed by RNA splicing during maturation of the final RNA product
Start codon
Triplet of nucleotides which serves as the initiation point of transcription of the gene.
This is always an ATG sequence,
WHats splicing?
Removal of introns from mRNA molecule
Whats a codon
3 nucleotides together (codon) code for an amino acid.
How does a reading frame start and end?
The Start ATG codon begins the reading frame = the sequence of codons required for correct amino acid sequence for the protein
Stop codon ends the reading frame
State the types of DNA variants
STOP and START variants
Missense
Nonsense
Synonymous – does not alter amino acid
Splice site
duplications/deletions (frameshift)
May be single nucleotide change
May be multiple nucleotide changes
Stop and start variants
What happens if variant in start codon?
WHat happens if variant in stop codon?
What does this produce?
DNA changes occuring in the start and stop codons.
If there is a variant in the start codon, in this examples and G to C change, transcription will not be initiated and there will be no protein product.
If there is a variant in the Stop codon, transcription will continue into non-coding DNA 5’ of the gene.
This will produce a protein with amino acids at the 5’ end that may interfere with structure and function and is likely disease causing.
Missense variant
Give example
Cause a change in the amino acid that the DNA codon codes for.
Example, a G in a GGC codon is change to a A, producing a GAC codon
This causes a change of the glycine amino acid for an Aspartic acid.
This may or may not be pathogenic,
Nonsense
DNA variants that alter the amino acid that the codon codes for.
Instead of a different amino acid, it can encode a stop codon - introducing premature stop codon.
Translation stops prematuraely
Shortened/incomplete protein.
Protein may lose function
Nonsense mediated decay
Deletion variants
Loss of nucleotides - range from one to whole gene
Example - 2 nucleotides from a GTA codon are deleted result in altered reading frame. - frameshift
Frameshift often result in truncated protein
Any varient that causes a fraemshift is almost always disease causing
Duplication variant
The addition of nucleotides alters the reading frame, causing a frameshift and an altered amino acid sequence. (new amino acid sequence)
And like deletions a premature stop codon often causes premature truncation of the protein.
Example - an additional T nucleotide is added.
You can see the triplet GTA is changed to a GTTA, therefore the reading frame is disrupted.
The GTA triplet is replaced by a GTT triplet follow by an ACC, then CGC.
Amino acid sequence altered compared to WT
RNA splicing
Splicing, the removal of introns from the mRNA, is controlled by the splices sites that flank the intron.
The Acceptor splice site, is at the 5’ end of the intron and is always an A G dinucleotide.
The donor splice site, is at the 3’ end of the intron and is always a G T dinucleotide.
The splicing machinery identifies the donor splice site and begins cutting out the DNA from that point on.
This will only stop once the splice machinery identifies the acceptor splice site which is then follow by the exonic sequence.
The resulting mRNA strand contains sequences only derived from exons.
Note that in RNA, uracil replaces thymine.
Donor splice site variants
Mutations that occur in introns
Example, the G of the GT donor splice is changed to an A.
This will mean that the splice machinery will not recognise the donor splice site.
Therefore it will not begin the removal of the intronic DNA.
The resulting mRNA will include the entire intron.
This will alter the proteins structure and possibly function.
It can also cause an alteration in the reading frame, causing a frameshift and as discussed with deletions and duplications leading to an inevitable prematurely truncated protein.
Such splice changes are often disease causing.
Alternative splicing of genes
This process uses different splice sites within one gene to generate related but different protein products.
i.e one gene can produce a variety of different protein products dependent on which splice sites are used.
Acceptor splice site varients
A mutation in the AG of an acceptor splice site has a similarly damaging effect.
However, rather than including the intronic DNA, the result it is that the exon is excluded.
The splice machinery recognises the donor splice and beings removed nucleotides,
But as the acceptor splice site is no longer recognised, it does not stop once it reaches the exon and continues to remove all DNA until the next active acceptor site is reached.
Removing an entire exon would be most likely disease causing.
The exon may encode vital parts of the protein, such as active sites, DNA or protein binding sites.
Why is DNA variant Nomenclature important?
To describe variant so that others can understand
Share genetic knowledge
Chaos otherwise!
Similar to ISCN for chromosome karyotype description
What p.Gly4Asp
Aminoacid position 4
Amino acid change Glycine to Aspartic
c.11G>A
I misssense change
a G to A change at position 11.
11 nucleotides from the A of the ATG start codon
> indicates a chnage of nucleotide, therefore G>A
How many alleles for each gene
2
Varients can be…
Heterozygous - one allele has the varient
Homozygous - both alleles have the variant
Hemizygous (if X-likned and male carrier)
How are nomanclature of two alleles separated
by a ;
How are WT alleles represented?
by a =
How are alleles indicated?
[]
When both alleles are WT
c.[=];[=]
p.[=];[=]
Describe c.11G>C and p.Gly4Asp assuming that the second alleles is WT
c.[11G>C];[=]
p.[Gly4Asp];[=]
G to C change at position 20
c.20G>C
c.59T>A
T to A change at position 59
Two DNA changes in one gene
c.20G>C
c.59T>A
Write as if changes on different alleles, in trans
Also, write as if changes have unknown allelic relationship
c.[20G>C];[59T>A]
we may have this information following parental testing showing that each parent as one variant.
Unknown: c.[20G>C(;)59T>A]
Two DNA changes in one gene
c.20G>C
c.59T>A
Write as if changes on same allele, in cis
c.[20G>C;59T>A];[=]
box the situation is described where we know that the variants are on the same allele, again we would only be able to know this following parental testing showing that one parent had both variants.
Two amino acid changes in one gene
p.(Asp7Gly)
p.(Ala98Phe)
Write as if changes on different alleles, in trans
Also, write as if changes are unknown allelic relationship
p.[Asp7Gly];[Ala98Phe]
Unknown - c.[Asp7Gly(;)Ala98Phe]
Two amino acid changes in one gene
p.(Asp7Gly)
p.(Ala98Phe)
Write as if changes on same allele, in cis
p.[Asp7Gly;Ala98Phe];[=]
Deletion of nucleotides at postion 13 and 14
DNA deletion of two nucleotides
c.[13_14del];[=]
Deletion of signle nucleotide position 14
c.[14del];[=]
p.[Val5fs];[=]
Deletion = amino acid change
Chamnged reading frame resulting in frameshift
Frameshift occuered at valine at position 5.
What is stop codon indicated by?
*
Nonsence variant (changes create new premature stop codon)
T to A change at c.21, amino acid 7 is Gly
DNA = c.[21T>A];[=]
Protein = p.[Gly7*];[=]
What are two things to think about when interpreting DNA variants?
Is the variant pathogenic?
Is the variant benign? many in human genome called polymorphisms
How are the likely effects of DNA variants analysed?
follow the guidelines laid out by the Association of Clinical Genetic Science.
Variant classification
1-5
1 Benign
2 Likely Benign
3 Unknown significance
4 Likely Pathogenic
5 Pathogenic
What are the types of evidence?
Variant frequency vs disease incidence
Type of variant
Splicing variants e.g Adverse effect on splicing predicted
Functional information
Functional studies
Clinical information – does genes’ known disease associations fit patients phenotype?
Family study
How is gene inherited?
Variant frequency vs disease incidence
Variant found in control normal populations? (se Gnomad database https://gnomad.broadinstitute.org)
If yes, less likely to be pathogenic,
If not, supports pathogenicity
Evidance - Type of variance
e.g. Nonsense vs synonymous
Truncating changes more likely to be pathogenic
Missense - Is wild type amino replaced by one with different properties?
Is this amino acid change predicted to alter shape and or function of the protein?
Splicing variants e.g. Adverse effect on splicing predicted?
If yes, likely to be pathogenic
If not, less likely t be pathogenic
Functional information
Located in functional domains?
Protein active site?
Repeat motif?
Functional information is important, tis relates to where the variant is located in the gene and protein.
Is it in a functional protein domain with an important function, or is it in area that has only a minor role in the protein function and disruption of which is likely to have little adverse effect?
Variants may disrupt important repeat motifs, repeats of amino acid sequences, important for protein structure, disruption of such a repeat could cause loss of that function.
Functional studies
Experimental assay shows adverse effect on protein function?
Or no effect?
Clinical information – does genes’ known disease associations fit patients phenotype?
E.g. Referred with learning disability but gene only associated with short limbs
E.g. Referred with learning disability and gene commonly reported in patients with learning disability
Family study
Is variant present in affected family members only – supports pathogenic
Is variant present in unaffected family members – supports benign
Is the variant de novo in patient or does parent carry it?
De novo supports pathogenic (unless carrier parent also affected)
How is gene inherited?
If recessive, has a second variant been detected?
Does familial pattern of disease fit a recessive, dominant or X-linked inheritance?
Literature searches & Disease database
Has variant previously reported as pathogenic or benign?