Genetics Flashcards
CADD score
Combined annotation-dependent depletion score
- predicts pathogenicity (disease-causing potential) of variants/indels
2 scores for predicting LoF constraint and their cut-offs
- pLI (probability of LoF Intolerant): =/> 0.9
- LOEUF (LoF observed/expected upper bound fraction): <0.35
How to judge missense constraint on gnomAD
Using missense constraint Z-score >3.1
Concept of “constraint” in genetics
Constraint describes how tolerant a gene is to genetic variation (different variants), ie a gene with high constraint is intolerant to variation.
E.g. LoF constraints (measured by pLI and LOEUF) and missense constraint (z-score)
Concept of “depletion” and “enrichment” in genetics
Depletion/depleted: genetic variant observed as less common or less frequent than the expected value
Enriched: variant more common/over-represented in specific population than expected
What is the pext score
Proportion expressed across transcripts score: per base expression pattern across transcripts and exons as well as in tissue of interest
When is the pext score useful in gnomAD?
Gives biological relevance of variant. When given variant is LoF and strong evidence for disease causing. A low pext score (<0.2) suggests variant not biological relevant (as it’s not expressed across the transcripts or across tissues of interest).
What is a Mendelian disease and some examples
Aka monogenic disorders. Caused by mutations in single gene.
Eg. cystic fibrosis, Huntington’s, Sickle Cell, Duchenne’s, Tay-Sach, PKU, Marfan, ADPKD
Define Expressivity and Penetrance
Expressivity: Severity of the phenotype that develops in patient with the pathogenic variant
Penetrance: the proportion of individuals carrying the pathogenic variant who display a phenotype
What is the “seed sequence” in relation to CRISPR
10-12 bps adjacent to the PAM (3’ end of the gRNA) that determines Cas9 specificity
- 1-5 bps = true seed region (from immunoprecipitation and ChIP-seq data - Zhang 2015)
What are the causes of the LOF pathogenic variants appearing in GnomAD?
- Transcript error
- Sequencing error
- Mapping error
- Last exon
- Other annotation error
- Rescue
What is homopolymer
Homopolymer refers to a stretch of DNA or RNA sequence where only one type of nucleotide is repeated consecutively, eg AAAAAAAAA
What is a “rescue splice variant”
A type of rescue mechanism in which alternative splicing of mRNA mitigates effect of LOF/pathogenic mutation, which preserves function of protein
Definition of nonsense-mediated decay
Surveillance pathway that reduces errors in gene expression by eliminating mRNA transcripts that contain premature stop codons
Types of mRNA surveillance pathways
- Nonsense mediated decay (NMD)
- Nonstop mediated decay (NSD)
- No-go mediated decay (NGD)
How does the location of the termination codon (from truncating mutations) in the last exon affect NMD?
The location of the last exon-exon junction complex (EJC) relative to stop codon NB. If stop codon downstream or within 50 nucleotides of final EJC, transcript translated normally. If upstream >50 nucleotides, NMD occurs.
Why does truncating mutations in the last exon generally not pathogenic
- Not subject to NMD
- mutations in 3’ UTR region
- Protein truncation tolerance - critical domains not affected
- haploinsufficiency tolerance - better toeralted if functional copy still present
Define linkage disequilibrium
Tendency of alleles to be transmitted more or less often than expected by chance alone - usually caused by close proximity of genes on the same chromosome
Define epistasis
Phenomenon in genetics whereby the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, termed modifier genes
Define heritability
The measure of proportion of the phenotypic variance of a population that can be attributed to genetic differences
What is the “missing heritability” question
- clear conclusion from multiple GWAS studies that highly significant hits accounts for small proportion of the heritability of disease
- amount of heritability explained by GWAS findings much smaller than estimated heritability from family and twin studies
Narrow sense heritability (h2)
Narrow-sense heritability (h2) is an important genetic parameter that quantifies the proportion of phenotypic variance in a trait attributable to the additive genetic variation generated by ALL causal variants
What are the explanations for the missing heritability problem?
- large number of common variants of small effects not yet discovered
- rare variants with large effect sizes not tagged on genotyping arrays
- overestimation of h2 (narrow sense heritability) in siblings/families due to environmental factors or epigenetics
Define tag SNP
A representative SNP in a genomic region with high LD that represent/called a haplotype
Define fine-mapping
Process of determining the genetic variant(s), ie causal variant(s), responsible for complex traits, given evidence of association of genomic region with a trait and assuming at least one causal variant exists
Types of chromatin annotations
- Open chromatin regions (indicate regions available for TF binding)
- histone modifications (highlights enhancer and promotor regions)
- DNA methylation
What are DNAse Hypersensitive Sites (DHS)?
- DHS are regions of DNA that are particularly accessible to cleavage by DNAse I, characterised by lack of nucleosomes.
- Indicates regions of high regulatory activity/regulation of gene expression. Correspond to regulatory elements eg promotors, enhancers, silencers etc.
- Used in SNP enrichment analysis (a type of chromatin mark)
What is ATAC-Seq
Assay for Transposase-Accessible Chromatin with sequencing.
Technique used to investigate chromatin accessibility at genome-wide scale. Used to identify areas of open chromatin - ie areas of high regulatory activity (promotor, enhancers and TF binding sites)
Steps of ATAC-Seq
- Transposase tagmentation: uses hyperactive Tn5 transposase enzyme that simultaneously fragments DNA and adds adapters to ends of DNA
- Selective fragmentation: Tn5 transposase selectively inserts adapters into regions of open chromatin
- Library preparation: PCR amplification of tagged sequences
- Sequencing
What is pleiotropy
A phenomenon when one gene influences two or more seemingly unrelated phenotypic traits, aka a gene that exhibits multiple phenotypic expression
What is allelic heterogeneity
Phenomenon in which different variants (alleles) in the same gene cause the same or similar phenotype
Difference between epistasis and allelic heterogeneity (AH)
Epistasis: effect of one gene variant affects (or is dependent of) another gene variant at a DIFFERENT locus
AH: different mutations within SAME locus of SAME gene influence the particular trait
What are phenocopy conditions?
Variations in phenotype that is caused by environmental conditions and not by genotype
Which domains in MYH7 are mostly affected in HCM
Globular head and hinge regions
What is the Non-stop Decay pathway and its mechanisms
NSD - targets mRNA (and peptide) for degradation if lacking a proper stop codon (ie the translation keeps going after where the stop codon should’ve been).
1. Recognition of non-stop mRNAs - ribosome stalls and signals NSD machinery
2. Ski complex (Ski2, Ski3 and Ski8) recruited to stalled ribosome, interacts with exosome (3’ to 5’ exonuclease activity)
3. Ribosome disassembled (by Pelota-Hbs1) - recycles ribosome
4. Faulty mRNA degraded by exosome assisted by Ski complex
5. Proteosomal degradation after ubiquitination
Causes of non-stop mRNAs
Errors in transcription, splicing or premature polyadenylation
Causes of ribosome stalling
- Defective mRNA:
- non-stop mRNA
- Damaged mRNA
- Secondary structures within mRNA (eg hairpins)
- Rare codons (due to low availability of corresponding tRNAs) - Amino acid deprivation
- leading to shortage of charged tRNA - Aberrant translation events - misincorporation of amino acids or other errors during translation
- Protein quality control mechanisms - interaction with faulty nascent polypeptides that do not fold properly
Slipped Strand Mispairing
SSM, or Replication Slippage
mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences
Leads to dinucleotide or trinucleotide repeats
At sites of tandem repeats
Key mechanisms of ASO action
1: RNA degradation via RNAse H1 - binding of ASO to mRNA = DNA-RNA duplex –> recruits RNase H1 –> cleaves RNA strand = degradation (gapmers)
2. RNA degradation using RNAi - using siRNAs –> recruited into RISC (RNA-induced silencing complex) –> cleaves target mRNA
3: Steric blocking - binding of ASO to mRNA physically blocks access to splicing factors or ribosomes = blocks splicing and/or translation
4. Modulation of splicing (eg exon skipping or exon inclusion): In exon skipping, ASOs designed to target splice sites in pre-mRNA - inhibits binding of spliceosome –> skips over exon (with frameshift mutation eg in Duchenne’s) and removes segment from mRNA; in exon inclusion, ASO can sterically block intronic splicing silencer (in SMA)
Monocistronic vs polycistronic mRNA
Monocistronic = mRNA translates only 1 single protein chain
Polycistronic = multiple ORFs that translate into multiple peptides
Examples of gene expression roles of UTRs (untranslated regions) of mRNAs
mRNA stability, mRNA localisation and translational efficiency
What are nuclear speckles
Regions in the nucleus associated with pre-mRNA splicing and transcriptional regulation
Aka interchromatin granule clusters
What is MALAT-1
MALAT-1 = metastasis-associated lung adenocarcinoma transcript 1
- long non-coding RNA lncRNA widely expressed in many tissues with roles in gene expression and splicing etc
- localised in nuclear speckles
- over-expressed in many CAs (eg lung, breast, liver) - promotes proliferation
Key differences between gapmers and mixmers
2 different types of ASOs
Gapmers = central DNA gap flanked by modified nucleotides vs Mixmers = alternating modified nucleotide
Gapmer = RNase-H mediated RNA degradation vs Mixmer = steric blocking
Gapmer = long-lasting vs mixmer = transient
What is an operon
Functioning unit of DNA containing a cluster of genes under the control of a single promoter.
Genes transcribed together into an mRNA and translated together or spliced into monocitronic mRNAs
Common in prokaryotes and rare in eukaryotes
Structure of an operon
- Promoter - nucleotide sequence that enables a gene to be transcribed
- Operator - DNA segment to which a repressor binds, defined in the lac operon as between the promoter and structural genes
- Structural genes
Others:
- repressor protein (coded by regulatory gene)
- inducer - displaces repressor
Explain the lac operon mechanism
Encoded in E. coli
Mechanism in which the bacteria switches on the transcription of enzymes that processes lactose when glucose is low. Not always fully active as waste of energy. Always background expression (for lac Y - B-galactoside permease) to enable detection of lactose in cell
Lac operon: 3 structural genes, promoter, terminator, regulator and operator
lacZ = B galactosidase (cleaves lactose into glucose and galactose)
lacY = B-galactoside permease - transmembrane symporter pumps B-galactosides into cell
lacA = B-galactoside transacetylase - transfers acetyl group from acetyl-CoA to thiogalactoside
In absence of lactose, repressor is bound to the operator, repressing transcription (by blocking DNA dependent RNA polymerase), albeit imperfectly = background expression
Presence of lactose (but not glucose), binds to repressor and inactivates it = transcription
Types of RNA secondary structures
Helices
Hairpin loop
Bulge loop
Interior loop
Junction/Intersection
What is the cre-lox recombination
Gene editing (deletion/inversion and translocation) technique that allows for spatiotemporal control.
Derived from P1 bacteriophage.
Cre Recombinase = 38kDa enzyme that recognises Lox sites - catalyses recombination between them:
- if LoxP same direction = deletion; opposite directions = inversion; interchromosomal recombination = translocation
Used in lineage tracing as reporter gene (GFP) can be activated by Cre to track cell populations (using cell-specific promoters)
What is the Bayesian Pairwise Analysis
Statistical approach used to model pairwise relationships between 2 variables, incorporating PRIOR KNOWLEDGE or ASSUMPTIONS about the data distribution. Involves updating PRIOR BELIEFS with new evidence to form a POSTERIOR DISTRIBUTION
Concept of Locus (Loci), Alleles, LD, Tag SNP and Haplotype NB!!!
A LOCUS is a specific location on the chromosome. Depending on context, it refers to either a specific base-pair position for Single-Base Pair Locus (eg rs-113488022 at chr14:23822594), or Multi-Base Pair Locus (referring to an entire gene - gene locus, structural variants - SV locus, QTLs - eg eQTL locus, or GWAS-locus). The ALLELE is the version of the gene at a specific LOCUS, and also depends on context: for Single-Base Pair Locus, it is the specific SNP (A,T,C or G) at that location, or for Multi-Base Pair locus, it refers to the entire SEQUENCE of nucleotides.
LD - Linkage Disequilibrium - refers to the NON-RANDOM association of ALLELEs at different LOCI in a population, where certain ALLELES (eg SNPs) are inherited more often than expected by chance due to physical proximity on a chromosome (due to CROSSING OVER during meiotic recombination). The TAG-SNP is a representative SNP in a region of genome with high LD that represents a group of SNPs called a HAPLOTYPE. Therefore a Haplotype is a group of SNP in high LD with each other.
Concepts of fine mapping, SNP enrichment and Colocalisation studies
Fine-mapping: methods aimed to define causal VARIANTS
SNP-enrichment: prioritise disease-relevant CELL TYPES
Colocalisation: nominates likely target GENES