Genome variation Flashcards

1
Q

What are the genomic variations in humans?

A

Genetic variations are differences in the DNA sequence of individuals
* Approximately 99.9% of the DNA of two unrelated individuals is the same.
* Genetic variations can be described at the level of:
- DNA
- RNA
- Protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of changes at DNA level

A

Substitution
Insertion
Deletion
Indel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are silent changes?

A

Silent changes or synonymous changes:
the nucleotide change does not affect the amino acid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a missense variation?

A

the amino acid is replaced with a different amino acid e.g. p.Arg70Cys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a nonsense variation ?

A

Nonsense amino acid Arg70 is changed to a stop codon p.Arg70*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can variations affect the phenotype?

A

Hair or eye color, height
q Increase or decrease susceptibility towards a condition :
u In combination with multiple other variants of small effect and/or with the environment
u Are usually common in the population
u Common diseases e.g diabetes, hypertension and asthma
q Directly cause a condition:
u Cause a major phenotypic effect
u Are usually rare in the population
u Typically cause Mendelian disorders, e.g. familial hypercholesterolaemia, sickle cell disease, cystic fibrosis.
q Increase or decrease response to drug

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can understanding variants give us?

A

Understanding the molecular basis of disease helps to identify the correct treatment and to design new drugs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What were the goals of the 1000 genome project?

A

§ The discovery of single nucleotide variants with frequencies ≥ 1%
§ The discovery of single nucleotide variants with frequencies of 0.1 –0.5% in gene regions
§ The discovery of structural variants, such as copy-number variants, other insertions and deletions, and inversions
§ Estimate the frequencies of variant alleles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the distribution of rare variants in the world?

A

Although most common variants are shared across the world, rarer variants are typically restricted to closely related populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What should you consider when assesin theee effect of a genetic variant?

A
  • has it been seen in someone with the phenotype I am studying?
  • has it been seen before?
  • how common is it in the general population?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is dbSNP?

A

dbSNP is a public archive of all short sequence variations.
* It was established in 1998.
* It is hosted by the National Center for Biotechnology Information (NCBI).
* It includes data from several organisms, not just humans
* It includes single nucleotide substitutions, short insertions/deletions, multi- base deletions or insertions.
* Variations > 50 nucleotides in length are annotated in the Database of Genomic Structural Variation (dbVAR) not dbSNP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are we interested in studying the exam?

A
  • The exome includes all exons of protein coding genes
  • The exome covers ~2% of our genome
  • Whole exom sequencing helps to identify novel disease-causing variants in patients with rare diseases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the ExAC database?

A

Established in 2014
* ExAC database reports exome sequences of >60,000 unrelated individuals from different populations (African, American, Non-Finnish Europeans, Finnish Europeans, East Asians, South Asians) that were sequenced as part of several disease-specific and population genetic studies.
* Exomes are from individuals with adult-onset diseases
* No homozygous variants causing childhood-onset Mendelian diseases are
present in the database
* It provides for each genetic variant :
* Global allele frequency
* Population specific allele frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

gnomAD

A

Genome Aggregation Database (gnomAD) aggregates exome and genome sequence data from several large scale-scale sequencing projects
* It provides data for 125,748 exomes and 15,708 whole-genome sequences
* It provides data for >240 million human variants
* Data are from unrelated individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is CllinVar?

A

“ClinVar aggregates information about genomic variation and its relation to human health.”
-It includes germline and somatic variants
-The clinical significance of the variant (e.g. benign, damaging, unclassified) is reported directly from the submitters
-Clinical significance is calculated from all records submitted for the same variant. The presence of a consensus or conflict is indicated
-A clinical interpretation is present for >200,000 variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Online Mendelian Inheritance in Man (OMIM)?

A

Online Catalog of Human Genes and Genetic Disorders
§ Freely available
§ It contains information on all known Mendelian disorders
- specific for monogenic disorders

17
Q

What is Online Mendelian Inheritance in Man (OMIM)?

A

Online Catalog of Human Genes and Genetic Disorders
§ Freely available
§ It contains information on all known Mendelian disorders
- specific for monogenic disorders

18
Q

Why should we do bioinformatics analysis?

A

You need to do bioinformatics before an experiment because you cannot realistically check all variants in the wet lab

19
Q

Catalogue Of Somatic Mutations In Cancer

A
  • Catalogue Of Somatic Mutations In Cancer (COSMIC)
  • Freely available
  • More than 4 million coding mutations reported
  • It combines genome-wide sequencing results from > 28,000 tumors
20
Q

What should be your next steps when you identify a variant?

A
  • is the variants causing any changes to the protein?
  • you need to look at the function of the residue
    -is the residue conserved? is it important for the protein function?
21
Q

What should you now about the function of thee wild type residue to asses the variant better?

A

Ø Is the postion conserved? If the wt amino acid is evolutionary conserved it is highly likely to be important for the protein function/structure
Ø What are the physico-chemico properties of the wt residue?
Ø Is the wild type residue in a protein domain? And if so, what is the function of the domain?
Ø Is the wild type residue involved in a protein-protein or protein-ligand interaction?

22
Q

What does it tell us when a residue is conserved?

A

Functionally or structurally important residues are conserved across homologues

23
Q

Why should we. study domains with protein variants?

A

Knowing the function of the domain can help to understand the function of the protein and the disruption caused by the genetic variant

24
Q

What is Pfam?

A

Pfam is a comprehensive database of protein families
* Members of the same Pfam family are evolutionary related and are
identified using Hidden Markov Model (HMM) profiles
* UniProtKB, 77.0% of the amino acid sequences in UniProt have at least one domain annotated in Pfam
* 53.2% residues in UniProtKB belong to a Pfam domain
* There are still many proteins and amino acid regions without a
functional annotation

25
Q

What is InterPro

A

InterPro is a consortium of protein domain databases
The two main advantages of InterPro:
- it integrates data from multiple resources-
- Adisadvantageisthedifficultyinkeepingup-to-datewiththeindividualresources

26
Q

What can changes in amino acids cause?

A

§ Substitution of structurally important amino acids: § loss of cysteine bonds
§ loss of hydrogen – bonds
§ Change in amino acid size: § steric clashes
§ introduction of cavities
§ Substitution to proline can cause a bending of the alpha helix
§ Change of polarity:
§ e.g. Hydrophobic to hydrophilic substitution in a core residue

27
Q

Can we predict the effect of genomic variations?

A

Only about 2% of human DNA codes for proteins. All the protein coding sequences in the genome collectively make up the exome.
q The non-protein coding areas of DNA between genes can have several functions e.g. :
u regulatory elements associated with gene expression,
u DNA elements which regulate chromosome structure,
u sequences which are involved in gene regulation and protein translation.

28
Q

Genomic variations occurring in coding regions

A

Ø Genetic variants occurring in the coding regions account for a large proportion of the known genetic variants responsible for human inherited disease
Ø They are attractive candidates for disease since they can affect protein function or structure
Ø Genetic variants resulting in amino acid substitutions have been widely studied
Ø Several in silico tools have been developed over the last 10-20 years to predict the benign or damaging effect of genetic variants causing amino acid substitutions.

29
Q

Predicting the effect of missense variants: in silico prediction methods

A

Sequence-based algorithms:
They use alignment of homologous sequences to calculate the conservation of the wild type residue
Evolutionary conserved residues are unlikely to tolerate substitutions
Structure-based algorithms. These can be divided in:
§ algorithms that calculate the difference in free energy (∆∆G) between the wild type and variant structures, e.g. FoldX
§ algorithms that use structural features but without providing a ∆∆G. These can be further divided in:

30
Q

Structure-based algorithms that do not provide a ∆∆G

A
  • In most cases, the 3D structural features of the residue harbouring the variant, such as surface accessibility, hydrophobicity etc., are combined with sequence- based features, e.g. Polyphen2.
    § These predictors generally calculate the probability of a variant being damaging but do not return information on the mechanism by which the variant affects the phenotype.
  • Some methods use 3D structure coordinates to perform an in-depth atom-based study of the effect of a missense variant, e.g. Missense3D and SAAPdap/SAAPpred.
    § These predictors provide information on the structural damage, e.g breakage of a cysteine bond or a steric clash, thus providing the user with information on the mechanism by which a variant may disrupt protein folding and/or function.
    § In the case of SAAPDab/SAAPpred information on sequence conservation is also included in the variant analysis.
31
Q

Sorting Tolerant From Intolerant (SIFT) algorithm

A

§ It uses a sequence homology-based approach to predict the deleterious effect of an amino acid substitution.
§ A group of proteins homologous to the query protein is automatically identified with PSI-BLAST and used to build the sequence alignment
§ SIFT calculates the probabilities for all 20 amino acids at a specific position.
§ The output score is a probability for each of the 19 possible amino acid
substitutions at each position in the aligned target protein
§ A score of 0.05 or less is considered indicative of a deleterious substitution.

32
Q

Polymorphism Phenotyping v2 (PolyPhen-2)

A

§ It estimates the probability of an amino acid substitution to be damaging based on a combination of sequence and structure-based features.
§ It also assesses the substitution qualitatively (benign, possibly damaging or probably damaging)
§ The properties of the wild type allele are compared to the properties of the mutant allele
§ Sequence conservation is evaluated by automatically selecting and aligning a set of homologous sequences

33
Q

What can you predict with PolyPhen2?

A

§ a single amino acid substitution
§ a large number of amino acid substitutions (batch mode)

34
Q

REVEL (Rare Exome Variant Ensemble Learner)

A

Ø REVEL is an ensemble method for predicting the pathogenicity of amino acid
substitutions
Ø It combines predictions from several tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons
Ø A set of variant (training set) was used to train a Random Forest
Ø Training set: ~6,000 disease variants and ~120,000 rare putative neutral exome
variants (allele frequency 0.001-0.01)

35
Q

Variant Effect Predictor (VEP)

A
  • It can be used to examine the effect of variants identified in human and non-human species
  • It can be used to analyze variants in coding and non-coding regions
36
Q

What does VEP provide?

A
  • annotations for the effect of the variant on transcript, protein, and regulatory region (e.g.
    promoter)
  • allele frequencies of the query variant
  • disease and/or phenotype information
  • In silico predictions from tools such as SIFT and Polyphen2
37
Q

In silico prediction methods: SAAP

A
  • Single Amino Acid Polymorphism data analysis pipeline (SAAPdap) and predictor (SAAPpred)
  • It analyses the likely structural effect of an amino acid substitution on the 3D protein structure
  • It also considers residue conservation
  • It requires the availability of an experimental 3D structure
  • The PDB structure does not need to be provided
38
Q

What is the accuracy pf variant prediction from models compared to experimental structures?

A

The accuracy of variant prediction obtained using 3D models is similar to that obtained using 3D experimental structures

39
Q

Why using 3D coordinates for variant prediction

A

§ 3D protein structure data enables one to investigate the effect of genetic variants at atomic level
§ Best Practice Guidelines for Variant Classification 2020: “3D structure data can be used to upgrade evidence for pathogenicity in the variant decision making process” (PM1 criterion)