Human Genetics Flashcards

1
Q

Why pharmacogenetics?

A

to improve clinical outcomes - less side effects, make the drug more effective
-Cost savings = it will reduce ineffective treatments, reduce the adverse drug events
- For personalised medicine = pharmacogenetics profile
- It is beneficial to know how a genetic variation can influence the drug response as well as to know how genes can be targeted for novel therapies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can be the cause of the side effects of a drug?

A

due to ineffective drug prescription, improper dosage; negative drug-drug interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of pharmacogenomics in general?

A
  • preventing adverse drug reactions
  • Predicting drug dose
  • enabling drug discovery/development
  • Developing targeted drugs for cancer therapy
  • Predicting the activation of prodrugs
  • Improving efficacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the consequence of being a non-metabolizer of an active drug?

A

active drug not efficiently converted to inactive form = leads to too much of the active drugs staying in the body and that can lead to side effects - toxicity → stronger and prolonged effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does bimodal distribution indicate? And what a smooth distribution?

A

-that it is a monogenic mutation/variation
-that it is polygenic mutation and there are cases with middle-level response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the function of CYP2D6 gene

A

-encodes for P450 isoform 2D6
- Involved in the metabolism of around 20% of the drugs commonly used for different purposes
- Function in the metabolism of xenobiotics (mainly in the liver)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do mutations in the CYP2D6 gene contribute to?

A

-SNVs of the gene may inhibit the enzyme function = variants associated with decreased function;
-variants that lead to splice defect (in 15% of European population)
-CNV - variations that will not affect small nucleotide but large stretches of DNA = big delitions, duplications; <2 copies → reduced function and >2 copies → increased function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the categories of different metabolizers

A
  • poor metabolizers = for active drug lower dose or alternative drug needed; drugs metabolized slowly or not at all; leads to toxicity and high doses of the drug staying too long in the body
  • intermediate metabolizers = reduced enzyme activity –> slower drug breakdown
  • extensive metabolizer = Normal enzyme function → expected drug metabolism.
  • Ultra-rapid metabolizer: Extra enzyme copies (gene duplications) → very fast drug metabolism; reduced drug efficacy; the drug is broken down very quickly (for active drugs) which leads to reduced effectiveness; for prodrugs → the drug is activated too quickly leading to stronger effects or toxicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does pharmacokinetics mean?

A

-relates to ADME = absorption, distribution, metabolism and excretion
-Genetic variability in the drug-metabolizing enzymes or drug-transport proteins
-The process of the uptake of the drug by the body and the biotransformation that the drug undergoes in the body
- The distribution of the drugs and their metabolites in the tissues
- The elimination of the drugs and their metabolites from the body over a period of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is pharmacokinetics seen in practice?

A
  • Is the dose too high → adverse events
    Poor metabolizers - reduced degradation of active drug; eg. CYP2D6 and debrisoquine
    Ultra rapid metabolizers - too much activation of prodrug e.g CYP2D6 and codeine → morphine
  • Is the dose too low → no efficacy
    Ultra rapid metabolizers = too much degradation of the active drug
    Poor metabolizers = reduced activation of the prodrug
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does pharmacodynamics mean?

A
  • drug and target interaction
    -Genetic variability in drug target or target pathway
    -Pharmacological actions on living systems
  • Also reactions with and binding to cell constituents
  • And physiological consequences of these actions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Responders vs. non-responders

A

Responders (response + no adverse event; response + target related or off-target related event)
- non-responders (no response + no adverse event; no response + target related or off-target related adverse event)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is pharmacodynamics seen in practice?

A
  • How the drug responds in the body
  • Beta blocker does something to the receptor and if the receptor is changed the drug may bind better or worse
  • Variations in the genes encoding for receptors can affect the response to certain drugs
  • Drug and target protein interactions; depending on the drug x target proteins interaction variations in the related genes may affect the response as well
  • The drug efficacy might be affected by binding of the drug to the target
  • Reduced or increased affinity causing poor efficacy or adverse events respectively
    Adverse events that can arise due to binding of the drug to off-target
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is reactive testing and when is it used and which technique is most commonly used?

A

Reactive testing → specific genes:
- Sanger sequencing and RT qPCR
- Testing is done (usually) after a patient has already experienced an issue with a medication
- it is case-specific and focuses on a single drug that the patient is currently taking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is preventive testing and when is it used and which technique is most commonly used?

A
  • all known pharmacogenes: SNP-array, or Whole genome sequencing
  • testing is done before prescribing any medications, usually as part of a broader genetic screening
  • Results are stored in medical records and used whenever a drug is prescribed
  • Before starting medications with known genetic risks (e.g. antidepressants);
  • In patients requiring long-term medication use (e.g., cancer therapy, psychiatric drugs);
  • As part of personalized medicine programs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the pros and cons of using Sanger sequencing for studying pharmacogenetics?

A
  • Pros: detailed information about specific regions of interest in the genome; Fast and everything can be done in the house
  • Cons: not high throughput and no information about CNVs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the pros and cons of using RT qPCR for studying pharmacogenetics?

A

Pros: can detect CNVs based on expression
Cons: Measures RNA and not DNA; not high throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the pros and cons of using SNP-array for studying pharmacogenetics?

A
  • Pros: asses multiple genes at the same time; possibility to detect CNVs (if there is a higher signal on the spot on the chip that corresponds to that SNP→ multiplications and if there is lower or no signal on the spot on the chip that corresponds to the SNP → deletion); customize array for SNPs of interest; cheap
  • Cons: does not detect rare variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the pros and cons of using Whole genome sequencing for studying pharmacogenetics?

A

-Pros: asses multiple genes at once; possibility to detect CNVs; possibility to detect almost all single nucleotide variants
-Cons: costly; what to do with the variants that have not been seen before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between monogenic and polygenic disorders?

A

monogenic: caused by one gene = monogenic; variations with large effect size; environment plays no role; specific inheritance patterns; rare disorders;
complex: caused by several genes = polygenic; variations with small effect; no inheritance pattern; envirnment plays a large role; common disorders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are DNA variants and where can they be localized in the genome?

A
  • DNA variants are changes in the nucleotide sequences
  • They can be localized in the non-coding or coding sequence
    – Non-coding: may change gene expression
    – Splice site: may change mRNA splicing of a gene
    – Coding: can change the protein linked to the gene:
    * Missense/nonsense/frameshift
    * This affects the size of effect that a variant has
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are SNPs?

A

A SNP is a common genomic variant at a single base position in
the DNA.
What does common mean? Usually a minor allele frequency of (MAF) > 1%.\
SNPs are depicted as rsID (e.g. rs1111)
Majority of the SNPs are in the non coding regions of the genome: intronic/intergenic, 5ÚTR and 3ÚTR and smaller fraction can be exonic;
disease related variants exist on a spectrum from very rare to common MAF and small to large effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are GWAS?

A

-they link the variants with the diseases;
- workflow: Collecting samples → extracting genomic DNA → genotype with SNP-array → quality control → statistics → interpret findings
- You need to make two groups of diseased and non diseased individuals and compare differences to discover SNPs associated with disease
-Genotyping with Sequencing (Next or Third Generation) or SNP array
- Effect sizes of SNPs on disease are small, so we need thousands
of samples to identify associations
- Comparing the SNPs between controls and cases; - See which variant(s) are more common for the diseased individuals compared to the controls; SNPs that are closer together are often inherited together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain SNP array

A

-techinique for capturing SNPs at many different loci simultaneously;
- DNA is isolated and fragmented → the fragmented DNA is labeled with fluorescent dyes and then hybridized (bound) to a microarray chip that contains probes (short sequences complementary to known SNP sites) → PCR → the microarray scanner detects which SNP variants are present based on the fluorescence intensity indicating whether the individual has homozygous or heterozygous genotypes at specific positions → The detected SNPs are compared to a reference genome to identify genetic variations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can we capture the whole genome
with 500k SNPs?

A

500K SNPs on array however there are 11 million SNPs in the genome which would indicate that only 3-5% of variants are captures, however in reality with 500K platform SNP-array more than 90% of common SNPs are captures with the concept of linkage disequilibrium (LD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Explain the concept of linkage disequilibrium

A

Diploid individual for two genes (Aa Bb) → haploid combinations: AB, Ab, aB, ab = 25% chance for each
But when checking the haploid cells we get: AB=50% and ab=50% => this means that AB is on one chromosome and ab is on the other = inherited together
-When two loci are closely located on the same chromosome we say that they are linked; and due to their physical proximity recombination between them is rare; this leads to certain allele combinations being inherited together more frequently then expected = linkage disequilibrium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is linkage equilibrium?

A

Two loci located on different chromosomes or far apart on the same chromosome, are independent and they segregate randomly during meiosis, which results in equal frequencies (25%) for all possible allele combinations = linkage equilibrium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are LD blocks?

A

parts of the chromosome that are inherited together (haplotypes) because of their closeness; regions of the genome where alleles at different loci are inherited together more often than would be expected by chance.
Over time, recombination events—where sections of chromosomes are shuffled during cell division—break up these associations, but regions with low recombination rates or strong selective pressure may maintain these associations for many generations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why are LD blocks important?

A

LD blocks are important in genetics because they allow researchers to study genetic variation and disease associations more efficiently. Instead of needing to examine every possible variant individually, they can focus on the regions defined by LD blocks, as they are likely to contain multiple variants that are inherited together and might contribute to specific traits or diseases.
Blocks of LD in human genome = one SNP can predict another one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do LD blocks become smaller over time?

A

As recombination rates increase or as we examine more distant generations, the LD blocks become smaller because the variants within those blocks are separated more frequently by recombination events. Smaller LD blocks indicate more genetic variation and a greater number of recombination events between the alleles
less diverse populations = larger LD blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is HAP map project?

A

to determine the common patterns of DNA sequence variation in the human genome and
to make this information freely available in the
public domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the tagging approach in selecting SNPs?

A

-Used in genetic studies to efficiently capture genetic variation without genotyping every single SNP in the genome
-Instead it takes advantage of LD where certain SNPs are inherited together in LD blocks
-Many SNPs are in LD and are therefore redundant = if one SNP is known the genotypes of many nearby SNPs can be predicted because they are almost always inherited together
-instead of genotyping all SNPs only a representative tag-SNP is needed
- Instead of genotyping all 8 million SNPs in the human genome, only genotyping 500,000 well-selected tag SNPs
-Using statistical methods called imputation, the missing SNPs can be predicted based on known LD patterns from large reference datasets (like 1000 Genomes Project or HapMap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How is GWAS data processed for quality control checks up?

A

on samples:
- clean out samples with many missing SNPs (bad quality DNA);
-gender check = compare results of SNPs on X chromosome with expected gender = heterozygosity in females higher and lower in males; calculate heterozygosity for X SNPs
-population outliers = pure populations
- duplicates and relatives = pairwise comparison of all samples: if genotypes are the same - duplicates or monozygous twins, if 50% are the same= siblings or parent child pair and 20% - close relatives;
On SNPs:
* Clean out SNPs missed in many samples (bad assay)
* Are certain SNPs over/underrepresented in control cases?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Explain what is allele calling

A

-process of determining which alleles are present at a specific SNP in a DNA sample; It’s a crucial step in GWAS because errors in allele calling can lead to false associations between SNPs and diseases; allele calling is done by the software → automatic call of AA, AB and BB genotypes
-it involves genotype clustering and error checking
- Genotype Clustering: Software groups similar fluorescence intensity signals into clusters representing AA, AB, and BB genotypes.
-Error Checking: Sometimes, the signal is weak or ambiguous, leading to genotyping errors. These are flagged or filtered out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the call rate in allele calling? And what is the common practice when it comes to GWAS?

A

call rate refers to the percentage of successful genotype calls for a given SNP or individual.
- Common practice: to use SNPs with call rate>95% = This means that at least 95% of individuals in the study should have a successful genotype call for a particular SNP; and use individuals with SNP success rate of 95% = Each individual must have at least 95% of their SNPs successfully genotyped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How are population outliers removed in GWAS?

A

Population outliers are identified by principal component analysis (PCA); PCA is a statistical method used to detect genetic variation among individuals and it reduces the high-dimensional genetic data into principal components that capture the major differences between individuals; -=individuals with different ancestry will cluster separately on PCA plots

Example: A European GWAS study includes a few individuals of African ancestry and since their genetic variation is distinct, PCA identifies them as outliers; If these individuals are not removed, SNPs common in African populations might appear to be associated with a disease just due to ancestry differences—not because they are biologically relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How is the quality control for processing GWAS data on the SNP level performed?

A

the Hardy - Weinberg equilibrium (HWE) - for the control population as a quality checking ;
-compare observed with the expected genotype frequency and calculate chi-square
-Deviation from HWE within the control population are most often due to incorrect genotyping > HWE is used as quality check
Reasons for exclusion: bad genotyping; population substructures and sampling bias

  • IF a SNP is relevant for disease, it SHOULD show a deviation from HWE because then we expect a different genotype between cases and control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What methods can be used for data analysis?

A
  • association analysis
  • binary trait - control/case analysis = Chi-square test
  • continuous trait: quantitative trait analysis=logistic regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How is association analysis performed?

A

method in GWAS to identify SNPs that are linked to a disease or trait;
- by comparing the allele frequency at each SNP separately between cases and control or analyzing how SNPs influence common traits (BP or height)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Explain each method used in association analysis

A
  • Disease (Binary test): case-control analysis = when trait is disease-related (yes/no, affected/unaffected); Compare allele/genotype frequencies at each SNP between cases and controls.
    If a SNP is significantly more common in cases than controls, it may be associated with the disease;
    with Chi-square test = tests whether the distribution of genotypes/alleles is different between cases and controls; Null Hypothesis (H₀): No difference in SNP frequency between cases and controls; Alternative Hypothesis (H₁): A significant difference exists

*Quantitative analysis - for continuous traits = used when the trait is measurable on a continuous scale (e.g. height, blood pressure)
Test whether SNP genotypes correlate with trait values
Statistical test: Logistic regression = models how SNP genotype influences the trait while accounting for other factors (covariates like gender, age, ancestry); Null Hypothesis (H₀): SNP does not affect the trait. Alternative Hypothesis (H₁): SNP significantly influences the trait.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How are false positives reduced in GWAS?

A
  • In GWAS millions of SNPs are tested across the genome to find associations with a trait or disease
    Since so many tests are performed, some SNPs will appear significantly just by chance (false positives), so in order to control this multiple testing correction is applied
  • The standard p-value threshold of 0.05 means that 5% of tests will show significance by chance; but in GWAS, we test ~1 million independent SNPs, so this means that 0.05 × 1,000,000 = 50,000 SNPs could show a false positive association
  • To reduce false positives, we use a stricter significance threshold
  • Bonferroni Correction: Controlling for Multiple Tests = Bonferroni correction adjusts the p-value threshold by dividing 0.05 by the number of independent tests (number of independent SNPs in the genome)
  • This means that only SNPs with p < 5 × 10⁻⁸ are considered genome-wide significant.
    = This strict cutoff reduces false positives, ensuring that only truly associated SNPs pass.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is a Manhattan plot? Explain it

A
  • Manhattan plot = graphical way to visualize the results of a GWAS. It helps in identifying SNPs that are significantly associated with a disease or trait.
  • X-axis: Represents the chromosomes (1-22, X, Y). Each dot corresponds to a SNP at a specific location; Y-axis: Shows the -log₁₀(p-value) for each SNP.
  • Each dot represents a SNP, positioned by its genomic location on the X-axis and its statistical significance on the Y-axis.
  • Higher dots indicate lower p-values (stronger evidence for association); The higher the peak, the more likely the SNP is truly associated with the trait.
  • Bonferroni Correction sets the genome-wide significance threshold at p < 5 × 10⁻⁸, so in the plot, a horizontal threshold line is drawn at -log₁₀(5 × 10⁻⁸) ≈ 7.3 and SNPs above this line are considered significant and could be linked to the disease/trait.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How is a Manhattan plot interpreted?

A
  • Tall peaks indicate strong associations with the trait
  • clusters of significant SNPs suggest the presence of a causal gene or functional variant in that region;
  • different colors represent different chromosomes, making it easier to distinguish chromosome-specific associations.
    -several dots in the peaks because multiple SNPs are in LD and thus significantly associated;
  • fine mapping in order to zoom in to associated loci and to find which is the most significant one = leading SNP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is fine mapping in GWAS?

A

When GWAS identifies a region with significant SNPs, it doesn’t tell us which SNP is truly causal.
This is because many SNPs in the region are in Linkage Disequilibrium (LD) and inherited together.
-Fine-mapping helps narrow down the exact variant responsible for the trait or disease.
- When GWAS finds a significant association, many SNPs in that region will appear significant just because they are correlated. But only one (or a few) of them is truly causal—the SNP(s) that directly influences the disease or trait.
- A simple approach is to pick the SNP with the lowest p-value (highest peak in a Manhattan plot) → this SNP is called the leading SNP or top SNP; However, this does not guarantee that it is the actual causal variant! It may just be highly correlated with the real causal SNP.
Better Approach: Fine-Mapping with Complex Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the limitations of GWAS?

A

Polygenicity; Association ≠ Causation, and Population Differences
-Many complex traits (e.g., height, intelligence, diabetes risk) are not controlled by a single gene but by many SNPs across the genome; a single SNP might have a small effect, but when combined with thousands of other SNPs, they contribute to a trait; This means that most traits and diseases are polygenic (influenced by multiple genetic variants);

-GWAS detects correlations between SNPs and traits, but this does not mean the SNP causes the trait; The significant SNP could just be in Linkage Disequilibrium (LD) with the true causal variant; To confirm causation, we need functional studies (e.g., CRISPR, gene expression analysis);

  • GWAS assumes a homogeneous population, but that’s not always true; Many early GWAS were conducted in European populations, but genetic variation differs between populations; A SNP that is common in Europeans may be rare or even absent in other ethnic groups; This means GWAS findings don’t always replicate across populations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How do non coding SNPs alter gene expression?

A
  • Modifying transcription factor binding sites → affecting how strongly a gene is turned ON or OFF
  • Influencing enhancer or promoter activity → changing how much of a protein is produced.
  • Altering splicing patterns → leading to different isoforms of a protein.
    Example: A non-coding SNP in the promoter of the LCT gene (lactase gene) affects how much lactase is produced = this determines whether an individual is lactose tolerant or intolerant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are eQTLs and what are the two types?

A

SNPs affect gene expression of nearby genes = expression quantitative trait loci (eQTLs) because they influence gene expression levels.
- There are two broad categories of eQTLs: cis-eQTLs (Local Regulation) and trans-eQTLs (Distant Regulation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Explain the cis-eQTLs

A

cis-eQTL = affects the expression of a nearby gene, typically within 1 Mb; usually located in promoters, enhancers, or untranslated regions (UTRs) of the gene; influences how much mRNA is produced from that gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Explain the trans-eQTLs

A

trans-eQTL = Affects the expression of a gene located far away, possibly on a different chromosome;
- often affects genes indirectly through regulatory proteins, transcription factors, or signaling pathways;
-mexample: A SNP in a transcription factor gene can alter its activity, which then affects the expression of multiple target genes across the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the purpose of eQTL studies?

A
  • eQTL studies = help determine how genetic variants (SNPs) influence gene expression levels; these studies identify whether a SNP is linked to increased or decreased expression of a specific gene;
    -cis eQTL choosing genes that are nearby the SNPs (advantage of that is that there will be less genes to study)
  • trans-eQTLs are Tissue-Dependent = For a trans-eQTL effect to be visible, both the SNP and the affected gene must be active in the same tissue;
    -some genes are only expressed in specific tissues (e.g., brain, liver, immune cells), so a trans-eQTL effect in one tissue may not exist in another;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How can a SNP affect the expression of several genes?

A

-A SNP could be located in a region that regulates multiple genes; for example, it could affect the expression of several genes that are in the same biological pathway or regulatory network;
-Example: A SNP in a transcription factor binding site could change the activity of that transcription factor, which in turn may alter the expression of multiple genes that the transcription factor regulates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How do SNPs have tissue-specific effects?

A
  • Some SNPs have tissue-specific effects. A SNP might influence the expression of different genes depending on the tissue or cell type where the gene is expressed.
  • In one tissue, it might affect one gene, while in another tissue, it might affect a completely different gene;
  • Example: A SNP in the interleukin-6 (IL-6) gene might affect immune response in immune cells, but it could influence metabolic processes in liver cells.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Why are eQTL studies important?

A

= Disease Complexity: One SNP causing multiple changes in gene expression can help explain complex traits or diseases where multiple genes are involved;
→ Polygenic Effects: Since SNPs can affect many genes in various tissues, they can contribute to polygenic traits, where multiple genes work together to influence a particular outcome (e.g., height, risk for diseases like cancer or diabetes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How can we determine the eQTL effects of the SNP?

A
  • To determine the eQTL effect of the SNPs: we need DNA of a patient and RNA = we match the expression of the gene; by comparing RNA expression in individuals with different genotypes,
  • we can determine whether a specific SNP leads to changes in gene expression; example: If an individual has a SNP in the promoter of the LCT gene (which controls lactase expression), we could compare the mRNA expression of LCT to see if the SNP affects how much lactase is produced.
  • The effect of a SNP on gene expression can vary across different tissues. A SNP might affect the expression of a gene in the brain but have no impact in the liver, or vice versa → Therefore, eQTL studies need to be tissue-specific.
    -RNA data from specific tissues must be used to accurately assess the SNP’s eQTL effects in that tissue.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is meant by the enrichment of genes in gene sets?

A

Enrichment of Genes in Certain Gene Sets = In genetic studies, especially those looking at diseases like depression, researchers often find that the genes affected by associated SNPs tend to cluster into specific gene sets that share similar biological functions or pathways → This clustering can give clues about the underlying mechanisms

-Gene set enrichment refers to the process of determining whether a set of genes associated with a particular trait or disease (such as depression) are overrepresented in a specific biological pathway or functional category → If genes related to depression are concentrated in pathways like neurotransmitter signaling or stress response, this suggests those pathways are important for the disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Why is gene set enrichment important?

A

Understanding Disease Mechanisms:
The genes identified in a GWAS are often dispersed across the genome, and their individual roles in the disease are not always clear. Gene set enrichment helps to identify patterns and common themes in the genes, providing insights into the underlying biological mechanisms;

Identifying Target Pathways:
By determining which biological pathways are enriched with disease-related genes, researchers can pinpoint areas to focus on for therapeutic development. For example, if genes involved in dopamine regulation are enriched, this could point to a potential target for drug development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Can we use GWAS results in the prediction of disease risk?

A
  • Different significance for different diseases = some diseases may have strong genetic associations that are easier to detect, while others may have weaker or more complex genetic relationships that are harder to uncover
    -Expectations are often overestimated = there is often a gap between the expectations set by early GWAS findings and the actual predictive power these studies have when applied to individuals.
    -While GWAS can identify associations, it’s still difficult to create accurate disease prediction models due to the complexity of genetic interactions, environmental influences, and incomplete knowledge about how certain SNPs cause disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is polygenic risk score?

A
  • Polygenic risk score → combination of variants can substantially change the disease risk = an attempt to combine the genetic risk of multiple small-effect SNPs into a single score that can be used to predict an individual’s risk of developing a particular disease;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the benefit of polygenic risk score?

A
  • Rather than focusing on a single mutation or genetic variant, a PRS considers thousands of SNPs, each with a small effect on disease risk. These SNPs might be spread across different genes or regulatory regions.
  • Combinations of Variants: By adding together the genetic risk associated with many variants, a polygenic risk score can substantially alter the overall risk of an individual for developing a disease. This score represents the combined genetic influence of many risk factors across the genome.

How It Works: Identify Risk Variants: From GWAS, identify the SNPs associated with the disease of interest (e.g., depression, heart disease, etc.) → Calculate Risk Score: For each individual, calculate a polygenic risk score by combining the effects of all the SNPs that were associated with the disease. Each SNP is weighted by its effect size, or how strongly it contributes to the disease risk →

Interpretation: A higher PRS means the individual carries more of the genetic variants that are associated with the disease, so they have a higher genetic risk for that disease. Conversely, a lower score means lower genetic risk.

Example of Polygenic Risk Score: If we look at diabetes, researchers might identify 100 SNPs that increase risk. Each SNP might increase the risk by a small amount. A PRS would combine the effects of all 100 SNPs into a single score for each person. This score could help predict if someone is at higher or lower risk for developing diabetes, based on their genetic profile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is considered as non coding RNA?

A

-Structural elements: telomeres and centromeres
-Residual elements: pseudogenes, transposons, tandem repeats
-Regulatory elements: promotors, enhancers, insulators, 3D organizing regions, introns
-Expressed elements: non coding RNA families

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Explain the control of gene expression at a DNA level

A

Regulation at DNA level - epigenetic modifications
Open chromatin = transcriptionally active; DNA chromatin needs to be open in order for the transcription to start
Active = unmethylated cytosines and acetylated histones
Closed chromatin = Transcriptionally repressed; DNA methylation at promoters
Non active = silent (condensed) chromatin; methylated cytosines and deacetylated histones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Explain the control of gene expression at a RNA level

A

Regulation of RNA expression: at transcription, RNA processing, mRNA transport, mRNA translation and mRNA degradation

Regulatory elements/ regulation of RNA expression through transcription
Promoters - 100 - 500 bp (cis) upstream of the transcription start site; activated by transcription factors; some genes have multiple promoters; usually contain binding site motif for RNA polymerase

60
Q

What are the classes of regulatory regions?

A

-promoters
-enhancers/silencers
-insulators, 3D organzing regions

61
Q

What are promoters?

A

Promoters = occur 100-500bp (cis) upstream of the transcription start site
Activated by specialized TF and specific TF bind specific promoters
One or many promoters (each with specific TF) may occur for any given gene
Contains the binding sites for RNA polymerase II

62
Q

What are enhancers/silencers?

A

Enhancers/silencers - speed or slow down the rate of transcription depending on the interaction with regulatory proteins
upstream or downstream of the transcription start site at a distance (trans);
contain sequences that are recognized by transcription factors and regulatory proteins bind specific enhancers/silencers depending on the DNA sequence

63
Q

What are insulators and 3D organizing elements?

A

Insulators/3D organizing region/boundary elements = can organize the 3D structure of the DNA;
long distance regulators; regulate transcription indirectly by changing the 3D structure of DNA
Regulatory regions interact through transcription factors and insulator proteins; contain binding sites for TF and insulator proteins

64
Q

Explain how TF can regulate gene expression and how the regulatory elements can interact with one another in the control of gene expression

A

Transcription factors form complexes that bind to different sites on the DNA; some TF are common in all cell types, some are very specific
Loops can form in DNA bound to TF and make contact with upstream enhancer/silencer elements
enhancer - promoter looping = activation
Insulator looping = inactivation
TFs are cell dependent but also exposure dependent: interferon signaling

65
Q

Explain the RNA processing control

A

RNA processing control - splicing, capping, polyadenylation
Alternative polyadenylation = where the polyA tail is added
Alternate splicing = which exons are spliced
Alternative splicing and alternative polyadenylation can happen at the same time; two completely different proteins encoded by the same gene can have different functions

mRNA degradation control = post-transcriptional regulation; RNA is usually unstable in the cytoplasm and is subjected to degradation
tRNAs and rRNAs are usually very stable
Stability may change is response to regulatory signals and therefore is thought to be an important regulatory control point

66
Q

Explain the post-translational control

A

Post translational control = protein degradation
Proteins can be short or long lived
Last chance for the cell to affect gene expression
Protein tagged by a signaling protein it enters proteasome to be degraded

67
Q

What are microRNAs?

A

-role in post-transcriptional modifications;
regulate the amount of mRNA present;
found only in eukaryotes; highly conserved;
transcribed by RNA polymerase II;
usually are intergenic (between genes) and can be transcribed by the same promoter as the gene or can be transcribed by their own miRNA specific promoter; can be intronic or intronic clustered; very rare can be exonic
miRNAs processing happens in both nucleus and cytoplasm;
Originate form 5’ capped and polyadenylated full length primary (pri)-miRNA

68
Q

Explain the microRNA processing

A

starts with the gene where it is found and then it is transcribed; in nucleus miRNAs are transcribed as long primary miRNA (pri-miRNA) transcripts, which contain a hairpin structure; and in the nucleus the Drosha enzyme (part of the Microprocessor complex) cleaves the pri-miRNA, generating a shorter pre-miRNA (precursor miRNA); pre-miRNA is transported to the cytoplasm by Exportin-5, a nuclear export receptor.
In the cytoplasm the Dicer enzyme cleaves the pre-miRNA into a miRNA/miRNA duplex* (a short double-stranded RNA).
One strand (the guide strand) is loaded into the RNA-induced silencing complex (RISC), while the other strand (passenger strand or miRNA*) is degraded.
The mature miRNA is now incorporated into RISC and guides it to target mRNAs, leading to gene silencing through mRNA degradation or translational repression.
The functional miRNA is single stranded and is contained in the RISC complex = the guide strand which is thermodynamically stable forms the functional miRNA

69
Q

How is microRNA preforming its role and function?

A

Seed sequence of the miRNA recognizes the target mRNA; for degradation a perfect match is needed, a small mismatch can lead to transcriptional repression
The seed sequence in (miRNAs) is short, highly conserved nucleotide sequence (usually 6-8 nucleotides long) near the 5’ end of the mature miRNA. This sequence is crucial for target recognition and binding to messenger RNA
In humans: perfect match leads to blocking of translation by deadenylation and to mRNA degradation by breaking down of the polyA tail
The seed sequence of the miRNA is essential for the recognition of the target mRNA

70
Q

How can microRNAs disfunction lead to disease?

A
  1. Genomic Rearrangements Involving miRNA
    Structural changes in the genome lead to the deletion (allelic ablation) of specific miRNAs.
    The missing miRNA can no longer regulate its target genes, potentially leading to disease.
  2. Mutations in miRNA Mature Sequences
    Mutations occur in the mature miRNA sequence, leading to imperfect target recognition.
    The miRNA fails to bind correctly to its target mRNA, disrupting normal gene regulation.
  3. Mutations in miRNA Target Sites
    Mutations occur in the mRNA target site, preventing miRNA from binding properly.
    The mRNA remains unregulated, leading to excess protein production, which can contribute to disease.
  4. Mutations Affecting miRNA Biogenesis
    Mutations interfere with the processing of miRNAs, affecting their maturation or function.
    Disrupts the RNA-induced silencing complex (RISC), leading to dysregulation of gene silencing.
71
Q

Give an example of how aberrant miRNA lead to development of cancer

A

Expressed in tissue specific and developmental stage specific manner
Mendelian inheritance is reported to be caused by dysregulation of a specific miRNA - mRNA target pair: point mutations in miRNA mature sequences or point mutations in miRNA target sites
Example: aberrant microRNAs can lead to oncogenic miRNAs that inhibit/block the translation of tumor suppressor genes; or they can lead to tumor suppressor miRNAs that activate the translation of oncogenes

72
Q

What are lnRNAs?

A

more than 200 nucleotides long and do not code for proteins

function mainly as regulatory molecules in gene expression.
Can be found in the nucleus or cytoplasm and act at the DNA, RNA, or protein level.
benefit of being long = depending of the sequence we have a RNA molecule that forms secondary structures similar as proteins and therefore can bind DNA,RNA or proteins; fold into intricate 3D structures, allowing them to interact with multiple molecules, including proteins, DNA, and RNA
Can be within genes, in introns, exons, anywhere in the genome
Usually they function in the nucleus, sometimes in the cytoplasm or both
Their function depends on the location; categorized based on their genomic location relative to that of nearby protein-coding genes
lncRNA expression levels vary with location, time, and physiological stimuli
Their expression is developmentally regulated; and the expression is tissue and cell type specific

73
Q

Explain the functions of lnRNAs

A

Functionally heterogeneous: diverse roles and mechanisms of action within the cell; can regulate gene expression at multiple levels—transcriptional, post-transcriptional, and epigenetic—through different mechanisms.
Can act as enhancers, repressors, or scaffolds in transcriptional regulation
Also bind specific sequences specifically
important regulators of gene expression in cis or in trans (gene inhibition/activation)
Example: XIST lncRNA silences one X chromosome in females (X-chromosome inactivation)

74
Q

What are the archetypes of lnRNAs?

A

-decoy
-guide
-scaffold

75
Q

Explain the decoy lnRNA

A

Decoy = Can be used as decoy for other structures in the cell, like TF or miRNA; regulators of the regulators;
Bind to and sequester TF or miRNAs away to prevent them from binding to their targets
As an end result = regulate transcription and translation

76
Q

Explain the guide lnRNAs

A

Guide = interacts with modifying complexes or TF and directs them to specific loci or genes
Can modify chromatin; acting as guide by guiding the chromatin modifying complex; lncRNAs recruit chromatin-modifying proteins (e.g., histone modifiers) to activate or repress genes.

77
Q

Explain the scaffold lnRNAs

A

Scaffold = act as a central platform for multiple modifying complexes and/or other cofactors to assemble transiently for function
Can be used as scaffolds to assemble multiple proteins

78
Q

Explain the genomic diagnostics for miRNA and lnRNAs

A

There is no genome diagnostics yet for miRNA and lncRNAs = not one rule to classify of variant in ncRNAs
Unlike protein-coding genes, where mutations can often be linked to functional changes in proteins, ncRNAs regulate gene expression in complex and context-dependent ways. Their functions depend on sequence, structure, and interactions with other molecules, making it difficult to predict the impact of genetic variations.
Tissue specific → tissue-specific expression makes ncRNAs valuable as biomarkers for diseases and potential therapeutic targets.

79
Q

What could we base the pathogenicity on?

A

phenotype ontology; the type of variant ( is it deletion, insertion etc) and based on that we can do prediction models (prediction whether it has an effect on the protein), search literature and databases to see if the variant is already known, look for population frequency (when it is more rare - more chance to be pathogenic), segregation of the variant in families
Clinical phenomena hampers the variant interpretation

80
Q

What can be the consequences of a variant on the protein product?

A

If a variant is in the promoter - the effect on the protein would be on the level of the expression (up or downregulation)
Variant in an intron - effect on the splicing; consequence of splice site variant: exon skipping, intron retention
Variant in exon - effect on the protein itself like misfolding; frameshifts, non functional proteins

81
Q

Explain the in vivo model

A

In vivo model - a live patient or an animal; within a live organism; to study processes in their natural context;
Advantages: reflects the full physiological context, including interactions between cells, tissues, organs, and systems; can test the effect of a variant in specific tissues
Disadvantages: ethical concerns related to animal testing, higher costs and time commitment and potential differences between species (e.g., human vs. animal models).

82
Q

Explain the in vitro model

A

In vitro models: conducted outside of a living organism, typically using cells or biological molecules in laboratory conditions (e.g., petri dishes, test tubes), eg. patient derived iPSCs, cell lines, humanized models (organ on a chip);
Advantages: Allows for highly controlled experiments with specific variables, lower cost, faster, and more ethically manageable than in vivo studies, easier to manipulate and observe molecular or cellular processes.
Disadvantages: Lacks the complexity of living organisms, so results may not fully translate to in vivo situations, cells may behave differently in culture than they would in the body; cancer background of the cell lines; often over-expression of gene construct: in vitro cell lines often use transfection or viral vectors to introduce a gene of interest, which can result in over-expression of that gene. This means the gene is produced at much higher levels than it would be in a natural biological system, potentially leading to abnormal cellular responses.

83
Q

Explain ex vivo models

A

Ex vivo model - animal tissue or organs, eg. patient liver slices, organoids (patient derived or engineered); outside the living organism but still in a biologically relevant environment.
Advantages: allows for more controlled experimental conditions compared to in vivo; reduces ethical concerns by working with tissues or cells rather than whole animals; Can provide more direct access to specific tissues or cells for study
Disadvantages: Does not account for the complexity of systemic interactions in a whole organism; limited time in culture before the cells or tissues lose their function.

84
Q

Explain computational models

A

Computational models: involves the use of computer simulations to model biological processes, chemical interactions, or genetic networks.

85
Q

How to get the gene with the alternative variant in the cell?

A

using plasmids=circular plasmid DNA used to bring a gene into overexpression in a cell by transfecting the plasmid with the gene/variant in cells (such as HeLa or HEK293)
The gene of interest comes in the form of a cDNA (derived form mRNA,only exons)
The variant of the gene can be created with site-directed mutagenesis PCR: by designing two complementary primers that are flanking the region where the mutation is to be introduced and these primers will contain the desired mutation within the middle of their sequence; the primers should have the mutation incorporated at the center of their sequence, flanked by regions that are complementary to the template DNA; when introducing point mutations one nucleotide is changed at the center of the primer sequence; when introducing deletion primers will be designed to flank the region to be deleted, the primers should be complementary to the sequence before and after the deletion region. The portion of the primer that corresponds to the deleted region should be missing in the primer sequence; when introducing an insertion mutation: primers will be designed with the insertion sequence included in the middle. The primers will still flank the region to be modified, but they will also have the insertion sequence incorporated into their design.

86
Q

What is a downside of using a plasmid?

A

The promoter in the plasmid is often a general promoter and is not significantly regulated which causes an overexpression
Overexpression plasmid models - since there is also copy of the reference gene in the cell, the endogenous protein is also present so the effect can not be really studied sometimes

87
Q

what happens after the transfection of the cell with the variant?

A

After the site directed mutagenesis and the transfection of the plasmid in cell lines, we can compare the VUS with the wild type protein (control) which is also transfected in cellsIs the variant affecting the protein?
With overexpression models you always need to check if the protein is functioning in a normal way
For normal protein functioning the expression levels need to be in homeostasis: not too much (increased stability/no degradation) but also not too little (decreased stability/ too much degradation); the protein needs to be localized at the expected location: e.g surface protein should be located at the plasma membrane and not retained in the golgi-system; if it binds with other proteins, is it a part of a bigger complex or is interacting with other proteins for specific function (receptors); it needs to have the right protein activity: eg. enzymatic activity, kinase activity; protein stability; post translational modifications

88
Q

What assays can be used to study expression levels of variants?

A
  • wester blot
    -ELISA
89
Q

Explain the western blotting

A

Protein Extraction and quantification =Cells or tissues are lysed to extract proteins; protein concentration is measured (e.g., using a Bradford or BCA assay) to ensure equal loading of samples.
SDS - page electrophoresis = Proteins are separated based on size using an SDS-PAGE gel.SDS (sodium dodecyl sulfate) denatures proteins and gives them a uniform negative charge.
Blotting = the separated proteins are transferred from the gel onto a PVDF or nitrocellulose membrane using an electric current; this makes the proteins more accessible for antibody binding.
Blocking = the membrane is incubated with a blocking buffer (e.g., BSA or milk) to prevent nonspecific antibody binding.
Primary Antibody Incubation = A specific primary antibody is added to bind the target protein. The antibody is chosen based on the protein of interest.
Secondary Antibody Incubation = A secondary antibody, conjugated with a detection enzyme (e.g., HRP for chemiluminescence or alkaline phosphatase for colorimetric detection), binds to the primary antibody.
Detection and visualization: A substrate (e.g., ECL for chemiluminescence or DAB for colorimetric detection) reacts with the enzyme on the secondary antibody, producing a signal. The signal is detected using imaging systems like X-ray films, chemiluminescence detectors, or colorimetric scanners.

90
Q

Explain the ELISA assay

A

ELISA (Enzyme-Linked Immunosorbent Assay) is a technique used to detect and quantify proteins, such as antibodies, antigens, hormones, or cytokines, in a sample. It relies on antibody-antigen interactions and an enzyme-linked detection system.
There are 3 types: direct sandwich ELISA (antigen is directly detected by an enzyme linked primary antibody), indirect sandwich ELISA and sandwich ELISA with Streptavidin-biotin detection
Sandwich ELISA captures the antigen between two specific antibodies and thus it is improving the specificity
Coating the plate = A plate is coated with either antibody or antigen and incubated for binding
Blocking = a blocking solution (milk, BSA) is added to prevent unwanted and nonspecific binding of antibodies to the plate
Sample addition = the sample containing the target protein is added to the wells and incubated to allow binding
Primary antibody incubation = a specific primary antibody is added to bind to the target antigen
secondary antibody incubation = antibody conjugated with an enzyme to recognize the primary antibody (or directly bind to the antigen in direct ELISA)
Substrate addition = a substrate is added to react with the enzyme to produce a colour change and the intensity of the color is proportional to the amount of the target molecule present
Detection and quantification = the color change can be measured using a spectrophotometer

91
Q

Compare Western Blotting and ELISA assay

A

Comparison of both techniques: ELISA is more sensitive and more samples can be analyzed at the same time; on the other hand western blot can recognize off target effects (antibody specificity): In Western blot, you can see band patterns and verify that the detected protein has the expected molecular weight. If the antibody binds nonspecifically, you will see extra bands, indicating potential off-target effects, if an antibody recognizes multiple proteins instead of the intended target, this will show up as unexpected bands on the blot.; and post-translational modifications (based on the molecular weight): You can use PTM-specific antibodies (e.g., anti-phospho, anti-acetyl, or anti-glycosyl antibodies) to detect modifications and since Western blot provides size information, you can confirm that the antibody binds to the correct protein form.

92
Q

How can the localization of the protein variant be measured?

A
  • microscopy: immumohistochemistry or immunocytochemistry
93
Q

Explain the immunohistochemistry method

A

Immunohistochemistry (IHC) = to detect specific proteins in tissue sections using antibodies; visualize the location, distribution, and expression of proteins in a tissue sample while preserving cellular and tissue structure
Has cellular resolution
Colouring using precipitation
Tissue preparation = the tissue is fixed, embedded in paraffin or frozen and then sectioned into thin slices and the sections are mounted on glass slides
Deparaffinization and rehydration →If necessary antigen retrieval → blocking → primary antibody → secondary antibody conjugated to a detection enzyme → enzymatic detection using a substrate that produces a colored precipitate at the site of the target protein

94
Q

Explain the immunocytochemistry method

A

Immunofluorescence = instead fluorescently labeled antibodies are used; most often used for cell monolayers (e.g grown in culture plate) but it can also be used for tissue sections
Gives subcellular resolution

95
Q

How can the binding of the protein with other proteins be measured?

A

-Immunoprecipitation
-Yeast Two Hybrid (Y2H)

Limitation: you have to know binding partners to determine this

96
Q

Explain the immunoprecipitation method

A

Immunoprecipitation (IP) = to isolate specific protein from a complex biological sample such as a cell lysate using antibody that specifically binds to the target protein and this protein-antibody complex is captured with protein-binding beads and is separated for further analysis
Sample preparation: cells or tissues are lysed → the lysate is centrifuged to isolate the proteins and to remove cell debris = leaving a clear protein solution
Antibody incubation: primary antibody specific for the target protein (forms antibody-protein complex)
Capturing with beads - protein A/G beads or magnetic beads which bind to the primary antibody, so the protein-antibody complex attaches to the beads
Washing = the beads are washed multiple times to remove unbound proteins which reduces the non specific binding
Elution = the target protein is removed from the beads
Analysis = western blot, mass spectrometry or enzyme activity assays

97
Q

Explain the immunoprecipitation/coprecipitation method

A

Co-Immunoprecipitation (Co-IP) is a technique used to study protein-protein interactions by isolating a target protein along with its interacting partners from a biological sample. It follows the same principle as immunoprecipitation (IP) but focuses on pulling down protein complexes rather than a single protein.
Coimmunoprecipitation can also be used for studying the function of the proteins if you know that that protein is binding to other proteins

98
Q

Explain the Y2H method

A

Yeast Two Hybrid (Y2H) - detects protein-protein interactions inside a yeast cell; is based on the modular nature of transcription factors which typically consists of DNA-binding domain = binds to the promoter of a reporter gene and activation domain = recruits RNA polymerase to activate transcription
In Y2H a bait protein (fused to DNA-binding domain = DBD) and a prey protein (fused to activation domain = AD) are introduced into yeast
If the bait protein and the prey protein interact with each other, then the two domains come together which will result in expression of the reporter gene which produces a detectable signal
However it has limitations: false positives or negatives due to the improper folding or expression in yeast; limited to binary interactions = cannot detect complexes with more than two proteins; might not detect transient or weak interactions

99
Q

How to study if a variant affects the promoter region?

A

Variant in promoter region possibly blocks transcription
Characterization of transcription regulatory sequences by exploiting reporter genes
Reporter genes are nucleic acid sequences that are encoding easily assayed proteins and they are used to replace other coding regions whose protein products are difficult to assay; produce an easily measurable signal (such as color, fluorescence) when they are expressed.
Very often the luciferase gene is used as a reporter gene = luciferase produces light by an ATP - dependent oxidation reaction and the light emission can be quantified with luminometer and is in a measure for the amount of luciferase that is expressed
Gene of interest or promoter is cloned into a plasmid and fused with the luciferase gene
The luciferase gene is placed under the control of the promoter that we wish to study or replace a gene that is difficult to measure

100
Q

How to study does variant affect splicing?

A

Splicing reporter minigene pCAS = genetic construct that is used to study splicing mechanisms and how specific splice site mutations can affect RNA splicing; is designed to model and measure the alternative splicing of pre-mRNA in a controlled and simplified environment
We have to measure RNA
Amplification of genomic DNA = DNA extracted from a patient’s peripheral blood → the specific segment of a cancer(disease) predisposition gene including the wild type variant exon along with 150bp of flanking intronic sequence is amplified using PCR → restriction enzyme sites are introduced for subsequent cloning
Cloning into minigene receptor vector = the amplified DNA fragment is inserted into the pCAS vector which contains a splicing reporter system; the pCAS vector includes: a PCMV promoter for transcription, two exons flanking the inserted sequence; the gene C1NH/SERPING1 which is used as a backbone; the inserted exon (wild type or variant) is positioned between these exons to analyze the splicing outcomes
Transfection into HeLa cells = the recombinant minigene construct is introduced into cell line and the cells express the minigene allowing for researchers to observe the splicing pattern in a cellular environment
RT-PCR and DNA sequencing = reverse transcription PCR is performed to analyze the resulting mRNA and the wild type and variant sequences are compared through agarose gel electrophoresis
A normal splicing pattern produces a specific band size; a splicing defect (caused by the variant) may result in exon skipping or retention, producing a different band pattern.
DNA sequencing further confirms the observed splicing changes.
If a variant is close to the exon then it is more likely that it affects the splicing
Example: CD44 (cell surface glycoprotein involved in cell adhesion, migration and inflammation); there are immune system isoforms which are important for immune cell migration and adhesion; whereas cancerous isoforms in tumors as a result of alternative splicing generates CD44 variant isoforms that promote cellular invasion, tumor metastasis and cancer progression
The minigene splicing assay is quite old school so it is better to use transcriptomic analysis but only with relevant tissue

101
Q

Explain the bacterial immune system and the CRISPR components

A

Bacteria have adaptive immunity against phages; bacteria have strand of RNA that is actually complementary to the DNA target on the virus;
then the Cas9 protein can cut the virus DNA at that place
= Cas9 - an RNA guided DNA endonuclease (RGEN) - has two nuclease domains: HNH (cleaves the target DNA strand which is complementary to the crRNA spacer) and RuvC ( cleaves the non-complementary, non target DNA strand)
=The guide RNA needs to recognize the DNA (of the virus) and bind to the Cas protein; so it needs two things: recognition of foreign DNA (spacer) and binding to Cas9 so Cas9 recognizes the spacer
=Guide RNA has two components: crRNA (this part is the one that can be changed;contains the spacer) and tracrRNA (this part is always the same)
=When a bacterial host survived a virus attack, a piece of the foreign viral DNA (spacer) is stored i the CRISPR repeat-spacer array in the genome of the host
=The CRISPR locus contains the sequence that encode for the Cas proteins, the tracrRNA and the CRISPR repeat-spacer array
=CRISPR repeat array = transcribed to pre-CRISPR RNA
=tracrRNA binds to the repeat sequence of the pre-CRISPR RNA and recruits Cas9; dsRNA is cut RNAseIII to produce sgRNA-Cas9 complexes

102
Q

How can CRISPR be used in eukaryotic cells?

A

CRISPR-Cas9 can induce double stranded breaks in all DNA sequences as long as there is a matching sequence (complementary to the crRNA) preceding a PAM sequence
PAM sequence serves as a recognition site for the Cas9 enzyme, allowing it to bind to the target DNA and perform gene editing; it is located immediately next to the target DNA sequence that the guide RNA (gRNA) is designed to recognize; it is typically a 3-5 base pair motif, and its composition depends on the Cas9 variant being used; for SpCas9 it is NGG
Always cuts on the same site, 4 nucleotides upstream from the PAM
Find a DNA sequence + PAM that you want to cut → add Cas9 → add sgRNA (with matching crRNA)
The sgRNA (more specifically the crRNA) sequence is changed to target a DNA sequence of choice (20 nucleotides) = this creates 1 099 511 627 776 unique options

103
Q

What are the methods of delivering CRISPR-Cas9 into cells?

A

-DNA based expression
-RNA based expression
-RNP based expression

104
Q

Explain the DNA-based expression ?

A

sgRNA plasmid encoding the sgRNA (which combines crRNA and tracrRNA) and a separate Cas9 plasmid that encodes the Cas9 gene under a promoter, are transfected into cells
The host cellular machinery transcribes the plasmids into RNA and translates the Cas9 protein
The expressed Cas9-sgRNA complex can perform the gene editing

105
Q

Explain the RNA-based expression ?

A

RNA-based expression (mRNA and gRNA delivery)
crRNA that contains the 20nt target sequence, tracrRNA that binds to crRNA to form the active gRNA, and Cas9 mRNA that contains 5’cap and poly (A) tail to enhance stability and translation
The Cas9 mRNA and gRNA are co-delivered into cells
The cell translates the mRNA to produce Cas9 protein which then associates with gRNA to edit the target DNA

106
Q

Explain the RNP-based expression ?

A

RNP-based delivery (RNA + protein)
Components: pre-assembled Cas9 protein and synthetic crRNA + tracrRNA (or pre-assembled sgRNA)
The Cas9 protein is directly delivered into the cell pre-complexed with gRNA; this ensures immediate action and rapid degradation reducing off target effects

107
Q

What happens after the cutting and making a double stranded break (DSB)? NHEJ

A

The DSB allows for gene editing by using the cellular DNA repair mechanisms
Two mechanisms for DNA repair: non homologous end joining and homology directed repair
NHEJ: quick and error-prone repair mechanism that directly ligates the broken DNA ends together without needing a template; often leads to small insertions or deletions (indels); is used to create gene knockouts because it introduces random mutations; If CRISPR-Cas9 creates a DSB in a gene, NHEJ might introduce small mutations that lead to frameshifts or premature stop codons, effectively knocking out the gene; active in all cell cycle phases but mainly in G1 and early S phase;
However double stranded breaks repaired by NHEJ do not always result in KO because amino acids are encoded by 3 base codons, so only 2 out of 3 deletions cause frameshifts leading to premature stop codons/truncated proteins
Nonsense mediated mRNA decay: NMD is a surveillance mechanism in cells that detects and degrades mRNAs containing premature stop codons (nonsense mutations) to prevent the production of potentially harmful truncated proteins; a key signal is if the stop codon is more than ~50–55 nucleotides upstream of an exon-junction complex (EJC)
If a premature stop codon (PTC) is too close to the original (natural) stop codon, the mRNA might escape nonsense-mediated decay (NMD), allowing some translation to occur

108
Q

What happens after the cutting and making a double stranded break (DSB)?HDR

A

HDR: a high-fidelity repair mechanism that uses a homologous DNA template to accurately repair the break; Used for gene knock-in or precise edits → Scientists can introduce a designed DNA sequence (e.g., a corrected gene or a fluorescent tag; If CRISPR-Cas9 cuts a gene, HDR can be used to insert a desired DNA sequence at the cut site, allowing for precise gene editing (e.g., correcting a disease-causing mutation); Occurs mainly in the S and G2 phases of the cell cycle when homologous DNA is available.

109
Q

How to fix a gene with CRISPR?

A

Create a sgRNA based on the sequence that we want to target with CRISPR so that it directs Cas9 to the target gene → Cas9 recognizes PAM and cleaves the DNA at a specific location within this sequence → this creates a DSB
One possible repair pathway would be NHEJ which is error prone; the broken DNA is glued back together and this often leads to introduction of some random indels and thereby creating gene knockouts or loss of function
A more precise method therefore is HDR where a repair template is used (synthetic DNA sequence that is supposed to repair/provide the right functioning gene) → the repair template contains homologous sequences (flanking arms) that match the DNA around the break site → the cell then uses the template to accurately repair the break and incorporate the desired genetic changes
But the efficiency of HDR is still 5-10% at best; the difficulties are that CRISPR editing is not efficient enough and is not specific enough (off targets)

110
Q

What are some alternative and more accurate ways to fix a gene with CRISPR?

A
  • By introducing restriction sites
    -Decrease off-targets with nickase Cas9 (nCas9)
    -dead Cas9 (dCas9)
    -Using pegRNA
111
Q

Explain the CRISPR method by introducing restriction sites

A

Silent mutations are introduced in the CRISPR target site without altering the amino acid sequence of the protein → These mutations create a unique restriction enzyme recognition site near the cut site → This helps in screening edited cells efficiently.
Cells are transfected with: Cas9 protein or plasmid, gRNA (guide RNA) targeting the gene of interest, Donor template (if using HDR for precise editing, such as introducing the silent mutation), After transfection, CRISPR starts editing the target sequence.
If the experiment uses a fluorescent marker (e.g., GFP linked to the repair template), FACS (Fluorescence-Activated Cell Sorting) can enrich cells that were successfully transfected and this helps eliminate unedited cells, increasing the efficiency of downstream analysis.
DNA is extracted from sorted cells → PCR is performed to amplify the targeted genomic region containing the CRISPR edit.
The PCR product is incubated with the selected restriction enzyme →If the CRISPR edit successfully introduced the silent mutation, the enzyme will recognize and cut the PCR product.
Unedited DNA will not be cut, allowing for easy distinction on an agarose gel.
To verify precise editing, the PCR product is subjected to Sanger sequencing→ The sequencing result will show whether the silent mutation and CRISPR edit were incorporated correctly
Disadvantage: not for all cell types = Only effective in actively dividing cells, since HDR is cell-cycle dependent (S/G2 phase), inefficient in non-dividing or primary cells

112
Q

Explain the function of nCAS9

A

nCas9 is a mutant form of Cas9 that introduces single-strand breaks (nicks) instead of double-strand breaks (DSBs) in DNA.
If you want to make a double stranded break then you need two guide RNAs (gRNAs) to generate nicks on opposite DNA strands; This increases specificity, as two independent gRNAs must bind correctly
Prevents Random DSB Formation: If an off-target site binds a single gRNA, only a nick is created instead of a full DSB; Nicks are efficiently repaired by the high-fidelity base excision repair (BER) pathway, reducing unwanted mutations.
Regular Cas9 relies on Non-Homologous End Joining (NHEJ), which is error-prone, while Cas9n encourages Homology-Directed Repair (HDR), which is more accurate, if a repair template is provided;

113
Q

Explain what is dCas

A

Dead Cas9 (dCas9) = catalytically inactive version of Cas9 that binds DNA without cutting it. It is commonly fused with different effector domains to modulate gene expression, epigenetic modifications, or DNA visualization.
Mutations in the RuvC and HNH nuclease domains make Cas9 unable to cut DNA.
dCas9 is still guided by gRNA to a specific genomic location, but instead of cleaving DNA, it acts as a DNA-binding scaffold.
By fusing dCas9 to functional proteins, it can activate, repress, or modify genes without altering the DNA sequence.

114
Q

Explain what is pegRNA

A

Using pegRNA (Prime Editing Guide RNA - specialized guide RNA used in Prime Editing, a precise genome-editing technique developed as an alternative to CRISPR-Cas9.
Unlike standard guide RNAs (gRNAs) in CRISPR, pegRNA has an extended structure that enables targeted DNA modifications without causing double-strand breaks (DSBs).
A pegRNA consists of three main parts: (1) gRNA Scaffold – Directs Cas9 to the target DNA sequence; (2) Reverse Transcriptase (RT) Template – Contains the desired edit (substitution, insertion, or deletion); (3) Primer Binding Site (PBS) – Helps the reverse transcriptase enzyme initiate DNA synthesis.
Cas9 Nickase (nCas9) + Reverse Transcriptase (RT) complex binds to the target DNA → Cas9 nickase creates a single-strand break instead of a double-strand break → The PBS region of pegRNA binds to the exposed DNA strand, acting as a primer → Reverse transcriptase uses the RT template to synthesize a new DNA sequence, incorporating the intended edit → The cell repairs the DNA using the newly synthesized strand, completing the edit.

115
Q

Explain how gene repression can be introdiced with CRISPR

A

Gene Repression (CRISPRi): dCas9 + a repressor domain = Blocks transcription by physically preventing RNA polymerase binding; → Can silence genes in a reversible and tunable manner; Example: Used to silence disease-related genes for functional studies.

116
Q

Explain how gene activation can be introdiced with CRISPR

A

Gene Activation (CRISPRa): dCas9 + an activator domain →Recruits transcriptional machinery to boost gene expression; Can be used to upregulate genes without inserting new DNA; Example: Inducing pluripotency factors to convert somatic cells into stem cells.

117
Q

Explain how epigenetic editing can be introdiced with CRISPR

A

Epigenetic Editing: dCas9 + epigenetic enzymes (e.g., DNA methyltransferases, histone acetyltransferases) → Modifies DNA methylation or histone marks to activate or silence genes epigenetically; Example: Using dCas9-DNMT3A to methylate and silence oncogenes in cancer research.

118
Q

Explain how RNA can be targeted with CRISPR

A

RNA Targeting (When Used with dCas13): A similar approach can be applied to RNAs instead of DNA using dCas13 → Used for post-transcriptional regulation of gene expression; Example: dCas13 fused to an RNA decay enzyme for targeted RNA degradation.

119
Q

How to recognize autosomal recessive inheritance?

A

Both alleles are defective -> monogenic disorder caused by 2 variants
▪ The patient’s parents often do not have the condition
▪ The patient’s parents are usually asymptomatic carriers
▪ More common among genetically related parents (e.g. cousin)
▪ Disorder occurs equally to men and women
▪ For each subsequent child of the same parents, recurrence risk is 25%

120
Q

How to recognize autosomal dominant inheritance?

A

1 allele is defective -> monogenic disorder caused by 1 variant
▪ affected person usually (!) has at least 1 affected parent
▪ disorder occurs equally to men and women
▪ is passed on by both men and women
▪ for each subsequent child of the same parents, recurrence risk is 50%
(if the affected parent is heterozygous)

121
Q

How to recognize X-linked recessive inheritance?

A

▪ 2 alleles are defective (however men only have 1 ▪ Almost only men get symptoms of illness
▪ Parents of the patient are usually not affected
▪ Mother is often an asymptomatic carrier and may have affected male
family members
▪ Women can be affected carrie

122
Q

How to recognize X-linked dominant inheritance?

A

▪ 1 allele is defect
▪ very rare
▪ disorder occurs in men and women, but more often in women
(often spontaneous miscarriage of male fetuses)
▪ child of an affected mother has a 50% chance of developing the disorder
▪ only all daughters of an affected father get the condition

123
Q

How to recognize Y-linked inheritance?

A

only men are affected
▪ affected men always have an affected father (or there must be a
de novo mutation)
▪ only inheritance from man to man
▪ all the sons of an affected man are affected

124
Q

How are the sequencing methods broadly catagorized?

A
  1. First generation sequencing (targeted sequencing)
    - Sanger sequencing
  2. Next generation sequencing - Short reads sequencing
    - WES
    -WGS
  3. Third generation sequencing - Long reads sequencing
    - SMRT
    - OxfordNano pore
125
Q

Explain Sanger sequencing

A

Sanger sequencing
Region of the genome that we want to analyze (only used for a specific segment)
Two peaks = to alleles
Advantage: cheap, reliable results
Not sufficient anymore for diagnostics = more genes associated with a disorder and we want to screen all of them which is not possible with Sanger sequencing
For family screening; when you know the mutation
However it is considered as the gold standard sequencing technology and NGS results are often verified with sanger sequencing

126
Q

Advantages and disadvantages of Sanger sequencing

A

advantage: cheap, easy to perform, simple analysis of results, accuracy, turnaround time
disadvantage: not very throughput, capacity is limited = one sample at a time; cost is higher for longer amplicons

127
Q

Explain NGS

A

Consists of 4 steps: 1. Constructing a library; 2. Clonal amplification; 3. Sequence library; 4. Data analysis
Fragmented DNA → add adaptors (adaptor ligation) = library preparation
A sequencing “library” must be created from the sample. The DNA (or cDNA) sample is processed into relatively short double-stranded fragments (100–800 bp). Depending on the specific application, DNA fragmentation can be performed in a variety of ways, including physical shearing, enzyme digestion, and PCR-based amplification of specific genetic regions. The resulting DNA fragments are then ligated to adaptor sequences, forming a fragment library. These adaptors may also have a unique molecular “barcode”, so each sample can be tagged with a unique DNA sequence. This allows for multiple samples to be mixed together and sequenced at the same time. For example, barcodes 1-20 can be used to individually label 20 samples and then analyze them in a single sequencing run. This approach, called “pooling” or “multiplexing”, saves time and money during sequencing experiments and controls for workflow variation, as pooled samples are processed together.
Paired-end libraries allow users to sequence the DNA fragment from both ends, instead of typical sequencing which occurs only in a single direction. Paired-end libraries are created like regular fragment libraries, but they have adaptor tags on both ends of the DNA insert that enable sequencing from two directions. This methodology makes it easier to map reads and can be used to improve detection of genomic rearrangements, repetitive sequence elements, and RNA gene fusions or splice variants.On the glass plate there are oligos which bind to the adaptors → bridge PCR (one molecule forms clusters of the same fragment) = cluster generation
(clonal amplification so that the fluorescent signaling is strong enough to be detected)
During the clonal amplification, each unique DNA molecule in the library is bound to the surface of a bead or a flow-cell and PCR amplified to create a set of identical clones.
Sequencing by synthesis; reading individual bases as they grow along a polymerized strand. This is a cycle with common steps: DNA base synthesis on single stranded DNA, followed by detection of the incorporated base, and then subsequent removal of reactants to restart the cycle.
Most sequencing instruments use optical detection to determine nucleotide incorporation during DNA synthesis
Paired end sequencing = the segment read from both sides
By giving every patient different adaptor indexes are created and several patients can be tested at the same time

128
Q

How is the data from NGS analysed?

A

Analysis can be divided into three steps: primary, secondary, and tertiary analysis Primary analysis is the processing of raw signals from instrument detectors into digitized data or base calls. These raw data are collected during each sequencing cycle. The output of primary analysis is files containing base calls assembled into sequencing reads (FASTQ files) and their associated quality scores (Phred quality score). Secondary analysis involves read filtering and trimming based on quality, followed by alignment of reads to a reference genome or assembly of reads for novel genomes, and finally by variant calling. The main output is a BAM file containing aligned reads. Tertiary analysis is the most challenging step, as it involves interpreting results and extracting meaningful information from the data.

129
Q

How is RNA sequencing performed using NGS?

A

RNA sequencing can be done as well; and exon skipping can be detected and intron retention as well = can interpret splice mutations
Allele specific expression can be measured
Since total RNA contains a lot of rRNA (~80-90%), which is not informative, we need to remove rRNA and enrich it for mRNA or non-coding RNAs. There are two main approaches: 1. mRNA Enrichment (for polyadenylated RNAs): Uses oligo-dT beads to capture mRNAs with poly(A) tails and this is ideal for eukaryotic mRNA profiling; 2. rRNA Depletion (for total RNA or prokaryotes) - uses ribosomal RNA (rRNA) depletion kits to remove rRNA and keep mRNAs and other RNAs.
The cleaned reads are mapped to a reference genome or transcriptome to determine gene expression levels.
Genome Alignment (Align reads to the whole genome) - Helps detect splicing events, novel transcripts, mutations.
Transcriptome Alignment (Align reads to known transcripts) - Used mainly for gene expression quantification.
Once reads are mapped, the number of reads per gene is counted to measure expression levels.
Disadvantage: very tissue specific
Since NGS platforms sequence DNA, not RNA, we need to convert RNA into complementary DNA (cDNA): 1.Reverse Transcription - An enzyme (reverse transcriptase) converts mRNA → cDNA and a primer (random hexamers or oligo-dT) initiates this process; 2.Fragmentation and Adapter Ligation - cDNA is fragmented into short pieces (~200-300 bp) and adapters are ligated to both ends of each fragment for sequencing; Indexes (barcodes) are added if multiple samples will be multiplexed.

130
Q

What is read depth, coverage or sequencing depth?

A

Read depth (also called sequencing depth or coverage) refers to the number of times a specific nucleotide position in the genome is sequenced. It is measured in X-fold coverage (e.g., 10×, 30×, 100×, etc.), where 10× coverage means that, on average, each base in the genome is covered by 10 reads. If a mutation (e.g., SNP, indel) is present in only one read, it might be an error rather than a true variant. A higher depth ensures that if multiple reads show the same mutation, it’s real and not a sequencing error

131
Q

Explain long read sequencing

A

Oxford Nanopore long read sequencing
Start with long fragments of DNA and attach adaptor proteins to the DNA ends
Motor enzyme = helicase and unwinds the DNA and pushes it to through the pore
Every nucleotide disrupts the electrical current in a specific way and is measured
Adaptive sampling
Good for detection of deletions, repeats and insertions
NGS technologies, especially Illumina, produce short reads (e.g., 100–300 bp). If a genome contains long repetitive sequences (several kilobases long), the reads may be entirely within the repetitive region and lack unique sequences that allow them to be properly aligned. Example: If a genome contains a 1,000 bp repeat and your reads are only 150 bp, you might sequence multiple identical 150 bp fragments that all map to the repeat region, making it difficult to determine where each read belongs.
Since repetitive sequences appear identically in multiple locations in the genome, NGS alignment algorithms struggle to assign the reads to their correct position. This leads to: Multi-mapping reads (reads aligning to multiple locations) and Random placement by the alignment software, which can introduce errors in variant calling or structural variation analysis.

132
Q

Explain the advantages and disadvantages of NGS

A

-advantage: cheap, high throughoutpit, applicability, multiple samples at the same time
-disadvantage: coverage and mapping limitations, more complex data interpretation, not able to detect structural variants

133
Q

Explain single molecule sequencing

A

Adaptive sampling is a feature in long-read sequencing technologies, particularly Oxford Nanopore sequencing, that allows real-time selection of DNA or RNA molecules for sequencing based on their sequence content. This method dynamically decides whether to continue sequencing a molecule or eject it from the nanopore before full sequencing, optimizing sequencing efficiency.
The sequencing software performs rapid base calling on the first portion of the molecule as it enters the pore. The software compares the partial sequence to a reference genome or target sequence; If the molecule matches a predefined target region (e.g., a specific gene or pathogen genome), sequencing continues; If the molecule does not match the target, the system reverses the voltage, ejecting the DNA before it is fully sequenced.
This process allows enrichment of desired sequences without the need for physical enrichment (e.g., PCR or hybridization capture).
also reduces sequencing waste, as non-target reads are quickly discarded.
Continuous incorporation of fluorescent nucleotides

134
Q

Explain SMRT technology

A

SMRT technology: Genomic DNA is fragmented into long pieces (typically 10-50 kb) → Hairpin adapters are ligated to both ends of each DNA fragment, forming a circular DNA molecule → These circular templates allow continuous, rolling-circle sequencing; The templates are placed into a specialized SMRT cell, which contains thousands tiny nanophotonic chambers where sequencing occurs and a single DNA polymerase enzyme is immobilized at the bottom of each chamber; The DNA polymerase incorporates fluorescently labeled nucleotides one at a time as it synthesizes the complementary strand. When a nucleotide is incorporated into the growing DNA strand, a fluorescent signal is detected in real time. After incorporation, the dye is cleaved and washed away, so the process continues smoothly. Since the SMRTbell template is circular, the polymerase continuously sequences the same molecule multiple times and this produces multiple reads of the same DNA strand, which can be combined to generate a high-accuracy consensus sequence (HiFi reads).
Very expensive

135
Q

What are the opportunities of Long read sequencing?

A

Sequencing 1 complete molecule: no
mapping artefacts
* Insertions, deletions detected in 1
read
* Distinction with pseudogenes
* Translocations with higher
confidence
* Phasing of variants
* Splice variants intact on transcript level
* Novel transcript identification

136
Q

What are the advantages and disadvantages of long read sequencing?

A

-advantage: high throughput, can detect structural variants with high confidence, good coverage and mapping; de novo assembly
-disadvantage: cost, accuracy not always the best, more complex library preparation

137
Q

What are the techniques used for detecting large chromosomal aberrations?

A

-karyotyping
-FISH
-Bionano optical genome mapping

138
Q

Explain how karyotyping is done

A

Difficult to be mapped by next generation sequencing
Arrest the cells in metaphase → stain → karyotype
Samples can be taken from blood (most common – white blood cells are used), Bone marrow; amniotic fluid (amniocentesis); Chorionic villus sampling (CVS) for prenatal testing
Cells are cultured and stimulated to divide.They are treated with colchicine to stop them in metaphase, when chromosomes are most condensed and visible.
Chromosomes are stained (e.g., with Giemsa stain) to produce a unique banding pattern (G-banding).
A microscope is used to arrange the chromosomes in size order, from largest (chromosome 1) to smallest (chromosome 22 + sex chromosomes X/Y).
Limitations of karyotyping: cannot detect small mutations (point mutations or single-gene disorders like cystic fibrosis); Low resolution compared to FISH (fluorescence in situ hybridization) or microarrays; You need living cells and sometimes difficult to obtain

139
Q

Explain bionano optical genome mapping

A

Bionano optical genome mapping - high resolution karyotyping; you can directly image the genome in high resolution to detect variations such as deletions, duplications, inversions and translocations;
High molecular weight (HMW) DNA extraction is critical. The goal is to obtain megabase-length DNA fragments (>150 kbp)
Common sample sources can be: Blood, Bone marrow, Cultured cells,Tumor tissue
DNA is labeled at specific sequence motifs using fluorescent dyes.This process creates a unique barcode-like pattern along the DNA molecule. Unlike sequencing, this does not cut or fragment the DNA.
The labeled DNA is loaded into a Bionano Saphyr Chip, which has nanochannel arrays that align single linearized DNA molecules.
The Saphyr System uses high-resolution fluorescence microscopy to image the labeled molecules in real-time; by capturing the images in repeated cycles across hundreds of thousands nanochannels on the chip all the images necessary to assemble the map of the entire genome are captured
The system captures millions of ultra-long DNA molecules, enabling the detection of structural variations >500 bp to megabase scale.
AI-driven algorithms reconstruct the genome map by aligning the observed barcode patterns to a reference genome; algorithm convert the images into molecules and then bionanno algorithms construct consensus genome maps;

140
Q

Explain how FISH is done

A

FISH uses fluorescently labeled DNA probes to bind to complementary sequences in the genome, allowing visualization under a fluorescence microscope.
Chromosome or tissue samples (e.g., blood, biopsy tissue, or cultured cells) are prepared by fixation and permeabilization, which makes the DNA accessible for hybridization
DNA probes are short sequences of DNA that are complementary to the target sequence of interest. These probes are labeled with fluorescent dyes
Probes can be designed to target specific genes, regions, or repetitive sequences on the chromosomes.
The labeled probes are added to the sample, where they hybridize (bind) to their complementary DNA sequence on the chromosomes.
The sample is then examined under a fluorescence microscope. The fluorescent signals emitted by the probes indicate the location of the target DNA sequences.
The number, location, and intensity of the fluorescent signals are analyzed to determine if there are any structural changes, gene amplifications, or abnormalities (such as deletions, duplications, or translocations).
Advantages of FISH: High Resolution: Detects small-scale chromosomal abnormalities that other methods like karyotyping may miss; Specificity: Can target very specific regions or genes of interest; Visualization: Allows direct visualization of DNA sequences in the context of chromosomes, making it easy to spot abnormalities; Flexibility: Can be applied to both metaphase chromosomes and interphase cells
Limitations of FISH:Requires Prior Knowledge: You need to know the specific DNA sequence or region you’re interested in; Resolution Limitations: While powerful for structural changes, FISH does not detect point mutations or small deletions that are too small to be visualized; Time-Consuming: The process of preparing the samples, hybridizing probes, and analyzing the results can take several days.

141
Q

OGM advantages and limitations

A

Advantages:
* Very long “reads”
* Enables identification of structural variation across a whole genome
* Complementary to short read and long read sequencing
Limitations:
* No base resolution
* Lengthy procedure
* Costly

142
Q

What can be the reason for lower molecular diagnostic yield?

A

Lower molecular diagnostic yield can be due to the fact that the disease might not be caused by a genetic defect, also the gene might not be yet known. Another reason is due to the difficulty to sequence certain parts of the genome such as the dark regions, repeats, introns and regulators. Lastly because of interpretation challenges
The causal gene might be detected but it might not be recognized as pathogenic by the prediction models or segregation analysis
Segregation analysis is a genetic method used to determine how a specific trait or genetic variant is inherited within families. It helps researchers and clinicians understand whether a disease or trait follows a Mendelian inheritance pattern (dominant, recessive, etc.) or is influenced by multiple genes and environmental factors.

143
Q

What is the difference in detecting dominant compared to recessive variants?

A

Different percentages when looking at dominant and recessive, bigger percentage for recessive because you need two copies for a variant; Recessive mutations often involve clear loss-of-function effects; Homozygous and compound heterozygous variants are easier to detect; Dominant disorders show variable expressivity and incomplete penetrance, complicating interpretation; Many dominant disorders result from de novo mutations, requiring parental sequencing for confirmation; Recessive diseases have well-characterized genes with known pathogenic variants; Thus, WES is more effective in diagnosing recessive disorders than dominant ones.

144
Q

What are the steps of variant filtering?

A
  1. filtering based on quality
  2. exclude variants that are likely benign or benign
  3. functional prediction
  4. focusing on relevant genes
  5. validate candidate variants with Sanger sequencing
145
Q

Explain the whole process of variant filtering

A

Filtering of variants
20 000 - 50 000 variants identified in coding regions; however many of these variants are common polymorphisms (harmless variations in the population) or sequencing artifacts that need to be filtered out; remove variants already classified as (likely) benign in the diagnostics databases
Filter variants on quality criteria: Coverage of coding regions (ensuring all exons are well-sequenced) and depth of coverage (number of reads; how many times a variant is read); this reduces the number of variants 10 times
Next exclude variants that are likely benign:
Exclude variants that are detected frequently in unaffected population datasets; Common variants that frequently appear in healthy individuals are excluded, as they are unlikely to cause disease
exclude variants that do not alter the amino acid sequence of the gene = silent mutations that do not impact protein function
Sequence parental DNA (trio analysis) to exclude dominantly inherited variants from an unaffected parent
Functional prediction: Prioritizing variants that are predicted to have a significant effect on protein translation: missense, nonsense, frameshifts
Focusing on variants on splice site (disrupt normal splicing of RNA, potentially damaging the protein), frameshift and truncating variants (can create non-functional or harmful proteins)
As well as variants affecting evolutionarily conserved amino acids (if a residue is conserved across species, it’s likely functionally important).
Significant changes in amino acid chemistry (e.g., replacing a nonpolar amino acid with a charged one can drastically change protein structure and function).
Computational tools (e.g., PolyPhen, SIFT, MutationTaster) predict which variants are most likely to be pathogenic
Relevant genes: Further analyze of variants in genes that are relevant to the patient’s phenotype
Analyzing variants that have been previously reported in clinical databases (ClinVar, OMIM, HGMD) or medical literature
Variants supported by clinical evidence of pathogenicity (e.g., enzyme deficiency or other measurable biological effects).
Matches known inheritance patterns (autosomal recessive, dominant, X-linked) and segregates with disease in the family.
Candidate variants are validated using Sanger sequencing, a gold-standard method for confirming mutations.
This process ensures that the detected variant is real (not a sequencing artifact) and segregates correctly within the family (if necessary)

146
Q

What is HPO?

A

A large database that aims to provide standardized abbreviation and vocabulary for phenotype abnormalities seen in human diseases
Each term in the HPO describes one phenotypic abnormality
It helps researchers and clinicians link genetic variants to clinical symptoms in genetic disorders
example: If a patient has intellectual disability (HP:0001249) and seizures (HP:0001250), HPO helps find genes related to these phenotypes.
Is developed using medical literature such as OMIM, DECIPHER, Orphanet, ClinVar

147
Q

Example of variant filtering

A

Starting with a certain number of variants detected using sequencing:
- different criteria for AF for dominant or recessive variants = for dominant it is <0.5% and for recessive <2%
- filtering based on HPO = choosing variants that match the phenotype
- filtering on inheritance pattern - choosing variants that match the pattern of inheritance
- filtering on the function that the variant has on a protein level

148
Q

How can we call a gene variant pathogenic?

A

What kind of mutation it is: missense, nonsense,splicing and so on
What does it do on protein level; is it in a known domain of the protein; protein family; is it conserved among species
Is the variant present at a position that is changeable; is it in important part of the protein
Population frequency: look for the variant and see what the population frequency is; if it is not too high it might be pathogenic
Prediction software: programmes that predict if the type of variant is detrimental for the protein; Polyphen, SIFT; if instead of one there is another amino acid the programme measures the distance and space of the amino acid in the protein ( is it damaging to the protein structure) ; also showing how conserved the positions are= so if it is very conserved and there is a different amino acid then that is most often not allowed/bad
Literature and databases; looking for more families with same variant and same clinical phenotype
Segregation analysis

149
Q

How is the prediction of splice site done?

A

Mutations affecting splice sites can be classified as: canonical splice site mutations (affect the highly conserved GT donor and AG acceptor dinucleotides at exon-intron junctions); cryptic splice site activation (creating a new incorrect splice site within an exon or intron); exon skipping; intron retention
Bioinformatics tools use computational algorithms to predict the impact of mutations on splicing

150
Q

What are we doing when there is VUS detected?

A

Segregation analysis; Highly suspected VUS → Segregation analysis
(when looking at the pedigrees) - example: C always segregating with the disease so the statistical power is enough to conclude that the C is causing the disease

151
Q

Explain how Interpretation of variants can be hampered by clinical phenomena

A

Penetrance of the variant = refers to the proportion of individuals carrying a particular genetic variant (mutation) who actually express the associated phenotype (trait or disease); means that not all individuals who inherit a pathogenic (disease-causing) genetic variant will develop the disease or show symptoms.
Why Does Reduced Penetrance Occur: other genes may influence the effect of the variant, either suppressing or enhancing it; lifestyle choices such as diet, exercise, smoking, and other habits may impact disease risk; exposure to toxins or infections, some diseases require an environmental trigger in addition to the genetic varian; epigenetic modifications; some conditions have late-onset penetrance, meaning symptoms appear later in life (e.g., Huntington’s disease).
Pleiotropy : 1 gene mutation leads to more than 1 phenotype; a single genetic variant can have effects on different organs, systems, or biological processes