Genomics Flashcards

1
Q

Gene expression

A
  • Exons code for proteins
  • Not all genes are active at the same time → diversity in cells →ensured by cell type specific gene expression
  • Gene expressed = the RNA transcribed from the gene is actually produced
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Measuring RNA production: qPCR

A
  • rtPCR (retro-transcribed) –oldest method
  • quantitative PCR allows gene of interest to understand how much cDNA is present in cells
  • RNA cannot be directly measured by PCR → synthesise cDNA → complementary to RNA → reaction w/reverse transcriptase enzyme
  • cDNA only includes exons because RNA is spliced
  • cDNA made with specific fluorophore incorporated into RNA
  • When Taq polymerase is completing second strand → fluorophore released→ qPCR quantifies how much fluorophore in reaction → more light = more RNA = more gene expression & earlier signal showing up
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

qPCR Normalisation

A
  • Results need to be normalised to measure actual change in transcription level → need to compensate for initial variations in mRNA and technical differences w/sequencing
  • Housekeeping genes included in qPCR → have stable expression → important for cell components → always expressed above certain threshold
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

qPCR Limitations

A
  • Quick, relatively accurate, cheap but…
  • Limited as to how many genes can be tested at any one time ~5-10 (not possible for 1000’s genes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Microarray

A
  • Revolutionised gene expression analysis
  • Allows for detection/comparison of thousands of genes simultaneously
  • Relies on base-pairing hybridization with probes for each gene to be measured
  • More expensive than PCR, but still relatively cheap
  • Can measure:
    o Differing expression of genes over time, between tissues and different states
    o Co-expression of genes
    o Identification of complex genetic diseases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Affy Gene Chip & Array Experiment

A
  • Each gene has 16-20 pairs of probes synthesized on the chip
    1. RNA extraction
    2. Make cDNA using biotin (important for binding to streptavidin which is probe-specific)
    3. If binding between gene of interest and probe  chip releases light  quantify it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Affy Expression Measurements

A
  • A = absent
  • M = marginal
  • P = present (P-value gives confidence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Microarray Limitations

A
  • Data is very noisy
  • Probes not available for all genes; Affy probes only for ~75-80% of human genes
  • Cannot detect genes w/very low expression levels
  • Data requires lots of statistics and analysis
  • Assay does not distinguish expression from different isoforms of the same gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Next generation sequencing

A
  • Based on getting fragments out of a genome →adding sequences (adaptors) at the edges →adaptors always have same sequences →adaptors bind on flow cell and bend fragment then second adaptor binds → bridge-like formation
  • Flow cell amplifies PCR → fragments w/adaptor still bind on sequencing machine to get sequenced →have millions of fragments ~75-150 base pairs → map onto reference genome
  • “Seq” principle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RNA-seq

A
  • Uses next generation sequencing to measure gene expression
  • Can assume that every mRNA present will be sequenced the same number of times
  • If experiment shows 2x mRNA for particular gene as control, then gene expression is 2x greater
  • Gives accurate measure of gene expression, even for genes w/v. low expression levels
  • Can identify exact transcript being expressed
  • Can potentially identify unknown transcripts with novel splice sites
  • Method:
    1. Extract all mRNA → convert to cDNA fragments
    2. Add sequencing adaptors → obtain short sequence using high-throughput sequencing
    3. Resulting sequence reads aligned w/reference genome or transcriptome
    4. Base count profile for each gene is created
  • Same procedure for control and variant
  • Read counts are proportional to gene expression level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RNA-Seq Normalisation

A
  • Important to normalise:
    o Sequencing depth = how many reads are sequenced by the machine
    o Length when dealing w/ different organisms (e.g. human vs mouse)
    o Amount of fragments in each sample
  • 2 main methods to normalise data:
    o Raw read count normalisation
    o Reads/fragments per KiloBase per Million reads (RPKM -single end reads; FPKM – paired end reads)

RPKM = 109C / N L

C = raw count of reads in transcript
N = number of mappable reads in experiment
L = transcript length (bp)
Normalizes for gene length (C and L) and library size (N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Raw Read Count Normalisation –DESeq2

A
  • Aims to make normalized counts for non-differentially expressed genes similar between samples
  • Does not aim to adjust count distributions between samples
  • Assume that:
    o Most genes are not differentially expressed
    o Differentially expressed genes divided equally between up and down
  • Relies on housekeeping genes
  • Normalisation looks for set of important, highly expressed genes → assume that expression is uniform across samples → same shift for housekeeping genes performed on all genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RNA-Seq Limitation

A
  • Cofounded by heterogeneity of the sample:
    o Different cell types
    o Mutations
    o Different cell cycle stage
    o Epigenetic modifications
    o Stochastic gene expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Single Cell RNA-Seq

A
  • Allows analysis of single cells
    o Enables improvement in resolution of gene expression within samples
    o Enables identification of heterogeneity in cell populations i.e. different cell types
    o Enables gene expression within single cells/cell types to be categorised
  • Tissue → dissociation of cells → isolation of cells → single cell → RNA extraction → cDNA synthesis → single-cell sequencing → expression profile → cell type identification
  • Plots show differing gene expression in cell types clearly distinguished; even cells difficult to separated (e.g. podocytes) are effectively dissociated
  • Results can be illustrated by heat maps or using dimension reduction analysis tools such as PCA or t-SNE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Future

A

Profile gene expression in vivo w/o need of isolating cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DNA Methylation

A
  • Reversible
  • Symmetrical so maintained thorugh cell division
  • Adding methyl group (CH3) to 5’C of cytosine by methyltransferases
  • In mammals, mainly occurs at CpG sites – CpG islands
  • CpG islans used for identification of potential promoter regions
  • Methylation of CpG island = silencing of gene expression
  • Represses gene expression by:
    o Preventing binding of transcription factors
    o Modifies chromatin structure to repress transcription
  • Methylation is major factor in epigenetic modifications
  • Methyltransferases in mammals: DNMT3a and 3b
  • During mitosis, hemi-methylated DNA is created → copied strand is unmethylated → recognised by DNMT1 →methylates new strand to maintain methylation state
  • Methylation of histone → chromatin repressed → cannot be transcribed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DNA methylation and disease

A
  • Methylation patterns in disease tissue ≠ from normal tissue → aids in identification of disease-causing genes
    o Specially in cancer and neurodegeneration → disease correlates with loss of methylation
    o E.g Alzheimer’s disease (NEP gene); Colorectal cancer (MGMT gene); breast cancer (PRLR)
  • Abnormal methylation silences tumour suppressor genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where does methylation occur

A
  • Intergenic regions = usually methylated
    o Maintains genomic integrity
    o Methylated DNA forms compacted chromatin → less accessible for recombination and translocation
    o DNMT1 deficient cells display genomic instability
  • Repetitive elements = usually methylated
    o Transposable elements are highly mutagenic if they can transpose within genome → methylation protects genome from TEs
    o Methylated C mutates to T over evolutionary time → prevent transposition
    o Methylation prevents recombination
  • Gene upstream regions = usually unmethylated
  • Promoter regions = usually unmethylated so create CpG islands
  • Lack of methylation creates relatively higher density of CpG due to lower rate of mutation to T
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Avoiding methylation

A
  • When region is methylated at all times, DNA tries to find evolutionary solutions for methylation to be avoided → not having cytosine anymore → tend to mutate to T overtime
  • If transposons accumulates mutation it loses functionality; 50% of human genome is made of transposons → 99% of them have lost their ability to be a parasite → cannot move anymore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Identifying DNA-Methylation

A

MeDIP-Seq
* Antibody recognizes methylated cytosine → binds meth DNA → immunoprecipitation → retain only antibody bound DNA →fragmented → next gen sequencing → sequences mapped back onto genome to identify methylated regions
Bisulphite sequencing
* Samples treated bisulphite → converts unmethylated C to U → sequence and compare samples to determine methylation e.g. cancer vs normal cells
* PCR only able to amplify U-containing DNA (non-methylated); with other primers can amplify all fragments that contain methylated DNA
Both
* Expensive
* Great resolution
* MeDIP-Seq requires antibody

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

X inactivation

A
  • The silencing of one of the X chromosomes in all female mammals
  • Required for dosage compensation to avoid over expression of genes on X chromosome
  • Inactivated X chromosome packaged as compacted heterochromatin
    o Compaction by chromosome wide histone methylation –H3K27M3
  • Inactivation by Xist gene (long non-coding RNA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Long non-coding RNA sequences

A
  • Longer than 200 nucleotides
  • Thousands identified but function largely unknown
    o Target different aspects of gene transcription mechanism
    o Can function as co-regulators or transcription factors
  • Act in ‘cis’ (same chromosome they are transcribed from) or ‘trans’ (different chromosome)
  • ncRNA Evf-2 = a co-activator for homeobox transcription factor Dlx2, involved in forebrain development and neurogenesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Xist

A
  • 17kb long; acts in cis
  • Expressed from only one of 2 X chromosomes  first detectable event in X inactivation
  • Xist contains many repeats → 6 identified so far
  • Repeat A (RepA) silences function of Xist → binds to PRC2 (Polycomb repressive complex –a histone methyltransferase complex) → lays down histone methylation along chromosome at Lys27
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

HOTAIR

A
  • Long ncRNA expressed from HOXC locus on chromosome 12 → represses HOXD on chrom.2
  • ‘HOX’ = important developmental genes
  • Acts in trans
  • Binds to PRC2 and LSD1 → PCR2 adds repressive H3K27me → LSD1 removes active H3K4me → combined function produce repressive chromatin structure
  • In cancer, HOTAIR acts on regions other than HOXD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Transcription factors

A
  • Up- or down-regulate a gene
  • Recognise specific DNA motifs ~5,6,7 sequences
  • Use several mechanisms to regulate gene expression:
    o Stabilising or blocking the binding of RNA polymerase to DNA
    o Recruit coactivator or corepressor proteins to the transcription factor DNA complex
    o Catalyse acetylation/deacetylation of histone proteins:
    i) Histone acyltransferase (HAT) activity –weakens association of DNA w/histones making DNA more accessible to transcription (up-reg)
    ii) Histone deacetylase (HDAC) activity –strengthens association of DNA w/histones making DNA less accessible to transcription (down-reg)
  • ~95% of Tf can only bind motifs when chromatin is in active methylated state (non-condensed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Variability of TF

A
  • Actual sequence of particular Tf binding site →variable → harder to identify → so described by general motif not fixed sequence
  • E.g. Neurod family recognise small sequences; HOX recognise dimers
  • P53 recognises dimers → most important tumour suppressor gene in mammalian cells → if mutation → cancer
  • Very specific motifs; even single mismatch would affect function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Pioneer transcription factors

A
  • Able to go to region of inactive repressed chromatin → bind it → signal for active machinery to arrive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Experimental Method (Tf) –ChIP-Seq

A
  • Chromatin immunoprecipitation followed by sequencing
  • Used to identify binding regions if binding protein is known
  • Based on antibodies
  • ChIP-Seq directly sequences the bound DNA → can then be mapped back onto genome for precise localization
  • Consensus/variability of binding sites can be determined from sequence:
    o Map reads back to reference genome
    o Most frequently sequenced fragments form coverage peaks at specific locations
  • Same approach as used for DNA methylation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

ChIP-Seq Method

A
  1. Chromatin in nucleus is cross-linked (Tf cannot detach) → fragment it
  2. DNA fragments include those w/target protein bound → incubate w/antibody specific for Tf of interest
  3. Immune precipitation → everything bound to Tf will precipitate (normally use magnetic beads that recognise the antibody to keep fragments bound to Tf stuck in tube)
  4. Reverse cross-linking
  5. Use specific proteases to degrade Tf → have DNA only
  6. Sequence using next-gen sequencing
  7. Mapped back onto genome → reads only map in regions where Tf was binding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Identifying histone modifications

A
  • Also use ChIP-Seq
  • Fractionate DNA → use antibody that binds to modified histone being studied → separated by immunoprecipitation →sequence isolated fraction using next gen seq → map sequence back onto genome to identify regions w/modified histones ie. genes under regulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

ENCODE

A
  • Encyclopaedia of DNA Elements
  • 400 scientists involved
  • 5-year project completed in 2012 → identify all functional elements in human genome sequence
    o Goal: to identify and characterise everything in the genome that is non-coding →the DNA/RNA regions regulated and the factors regulating them
  • 2003- human genome was sequenced → now have a reference genome allowing development of this project
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What they did in ENCODE

A
  • Looked at how many different genes are expressed in different cell types
  • Profiled binding of Tf across ~56 different cell types using ChIP-Seq
  • Profile DNA methylation across ~56 different cell types
  • Looked at chromatin conformation
    o Promoters & enhancers need to be in contact w/each other → contacts can be profiled by different techniques → look at 3D genome & chromatin
  • Profile all accessible regions of chromatin (euchromatin state)
    o Euchromatin = less condensed, gene-rich, more easily transcribed; nucleosomes are depleted; DNA is accessible for binding of Tf
    o Heterochromatin = highly condensed, gene poor, transcriptionally silent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Complexity of gene expression

A
  • Regulating gene expression → complex → involves many regulatory factors → Tf, enhancers, silencers, methylation patterns
  • Transcripts regulated by splicing factors e.g. exon and intron splicing enhancers and silencers
  • Functionality often cell/tissue and time specific
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

DNA Hypersensitive Sites (HS)

A
  • Hypersensitive sites = regions of chromatin highly sensitive to DNase 1
    o Nucleose→ less compact→ enables DNA to bind to proteins e.g. Tf
  • DNase-Seq = technique where DNase 1 cuts DNA only when accessible → isolating fragments → amplify through next gen seq → add adaptors w/ligation → cluster into flow cell→ cluster → reads mapped onto genome → they pile up in regions where chromatin was open and accessible
  • Mapping HS sites → identify location of genetic regulatory elements e.g. promoters, enhancers, silencers, locus control regions →(ChIP-Seq can only identify Tfs)
35
Q

DNase-Seq Overview

A
  • Open chromatin regions sequenced and mapped to reference genome
  • Nuclear extraction → DNase 1 digestion → library preparation → PCR amplification → high-throughput sequencing
36
Q

DNase-Seq Footprinting

A
  • Number of fragments that map to a sequence is a measure of regulatory activity
  • Sites bound by some Tfs show highly specific patterns of DNase I cleavage = ‘DNase footprints’
  • Footprints used to identify binding of specific Tfs; advantage over ChIP-seq
    o w/ChIP_seq need to know the transcription factor for immunoprecipitation
    o DNase-seq identifies the binding sites de novo
  • DNAse-seq is a genome wide version of DNA footprinting method
  • Footprint = prediction of what Tf could bind
37
Q

FAIRE-Seq

A
  • Similar method to DNase-Seq + addition of formaldehyde for cross-linking
    o More efficient in nucleosome-bound DNA than in nucleosome-depleted regions of genome
  • Phenol chloroform extraction of DNA → treatment to isolate nucleic acid from solutions
    o Cross-linked chromatin will go to bottom of tube (organic phase)
    o Condensed chromatin at bottom; active chromatin at top of tube
    o These increase resolution and reduce noise
  • DNA extracted and mapped to reference genome to identify open DNA regions
  • FAIRE-Seq higher coverage at enhancer regions over promoter regions
  • DNase-Seq higher sensitivity towards promoter regions
38
Q

ATAC-Seq

A
  • Uses mutated hyperactive transposase Tn5 instead of DNase1
  • Tn5 enzyme derived from transposons → attacks and chews open active euchromatin efficienctly
  • DNA fragments then isolated, sequenced, and mapped
  • Advantages:
    o Requires smaller sample than DNase-seq and FAIRE-seq (requires 1000x more cells)
    o V. fast → completed in 3 hours
  • Disadvantages:
    o V. expensive because monopoly
39
Q

MNase-Seq

A
  • Uses micrococcal nuclease
  • Cuts v. near to nucleosomes
40
Q

ChIP-Seq vs all others

A
  • ChIP-seq requires antibody
    o Profiling something specific e.g. one Tf of interest
  • All others don’t need antibody
    o Less specific → profile all active chromatin regions
41
Q

Chromatin Interaction

A
  • Within each chromosome TAD (Transcriptional active domain)- portions of chromosome isolated from each other
    o One TAD does not interact w/other TADs → enhancers/promoters can only regulate genes in same TAD (v. few expection)
  • Lowest possible level = looping
  • Knowing interactions is essential for understanding mechanisms of gene regulation in health and disease
  • All possible interactions can be profiled by different techniques (e.g ChIA-PET, 3C, 4C, HiSEQ, etc)
    o All based on same approach w/one variation
    o Use restriction enzymes: fragmenting genome → religation→ profile what is ligated
  • ChIA-PET → studies genome wide long range chromatin interactions involving protein factors
    o Involves additional step: antibody precipitation → to identify chromatin interactions that are regulated by a specific transcription factor, between distal and proximal regulatory sites and their associated promoters
  • Diff-linker, PETs, used to identify non-specific ligation noise; identify ligations between different ChIP complexes
42
Q

RIP-seq and CLIP-seq

A
  • V. accurate; can be reproduced
  • Methods similar to DNA-protein interaction identification
  • RIP-seq involves immunoprecipitation of RNA-binding protein (RBP) of interest → has to be done non-stringently
    o Low stringency = low specificity
  • Developed to CLIP-seq → includes cross-linking step using UV light (irreversible)
  • Final steps:
    o Digestion with proteinase K leaving peptide at binding site that modifies nucleotides to create cross-linked induced mutation sites (CIMS)
    o Reverse transcription to make cDNA → identify RNA-binding sites
    o Sequenced and mapped to transcript
43
Q

CLIP-seq variants

A
  • PAR-CLIP = improves crosslinking w/photoreactive RNA nucleotides
  • iCLIP = uses reverse transcriptase stalling to map individual nucleotide-protein interactions
  • miCLIP = modifies RNA methylase to map its binding sites
44
Q

-seq key features

A
  • All ‘seq’ methods use similar approaches and same final step to identify regions of interest (protein binding site, open chromatin, etc)
  • Method:
    o Isolate sequence e.g. fragmentation and immunoprecipitation, phenol/chloroform etc
    o Sequence fragment and map back to genome
45
Q

ENCODE controversy

A
  • Most of the genome is “functional” → controversial statement from ENCODE
    o ENCODE considers anything transcribed must be functional → but many transcripts are non-functional e.g. pseudogenes
  • ENCODE emphasized sensitivity over specificity → lead to false positives
  • Criticism: arbitrary choice of cell lines and transcription factors; lack of appropriate control experiments
46
Q

ENCODE limitations

A
  • Conducted in immortalised cell lines (derived from human cancers → v. easy to manipulate but v. unstable)
    o Want something closer to real healthy cells
    o E.g HeLa cells sometimes have 3/4 pairs of chromosomes → not diploid
  • Roadmap consortium
    o Profiled histone modifications across 25 human primary tissues (mark different chromatin states)
47
Q

What is a SNP

A
  • DNA sequence variations → occur when single nucleotide (A, T, C, or G) in genome sequence is altered; must occur in at least 1% of the population
  • SNPs make ~90% of all human genetic variation; occur approx every ~1000 bases; ~4-5 million SNPs in an individual human genome
48
Q

Why are SNPs important

A
  • Can affect how humans develop diseases
  • Can affect how an individual respond to pathogens
  • Can affect how an individual respond to chemicals
  • Can affect how an individual respond to drugs, etc
  • Potentially their greatest importance in biomedical research is for comparing regions of the genome between cohorts
  • Comparing cohorts with and without a disease – GWAS
49
Q

SNP location

A
  • Intergenic region → possibly transcription enhancer/regulatory region
  • Within promoter or transcription factor binding region
  • Within exon → Could affect protein coding
  • Within intron → Possibly regulatory region e.g. affecting splicing
50
Q

Disease SNPs

A
  • Can be v. dangerous or neutral
  • SNPs may be direct cause of disease or signal for increased likelihood of disease
  • Disease associated SNPs:
    o Monogenic → one nucleotide change leads to disease; relatively easy to detect/analyze; simple traits
    o Polygenic → many nucleotide changes affect probability of disease; hard to detect/analyze; complex traits
51
Q

Coding SNPs

A
  • Coding SNPs = potentially disease causing as they can affect the protein
  • Types: Synonymous (silent) & Non- synonymous
  • Synonymous mutation: change base but AA is the same
    o May still affect Exon Splicing Enhancers (ESE) or Exon Splicing Silencers (ESS) site so cannot always be ignored
  • Non-synonymous – change in base changes AA → mutation could be detrimental
52
Q

Transition and transversion

A
  • Transition (Ti) - most common substitution
    o Replacing purine by purine i.e. A → G or pyrimidine by pyrimidine i.e. T → C
  • Transversion (Tv) - less common
    o Replacing purine by pyrimidine or vice versa i.e. A → C
  • Ti/Tv ratio – varies within genome; used to assess GWAS data quality
    o Across entire genome averages around 2
    o In protein coding regions typically higher, often above 3 due to transversions in third base of codon being more likely to change the encoded amino acid
53
Q

Sickle Cell Anaemia

A
  • Inherited blood disorder due to mutations in beta globin HBB
  • Found primarily in African and related populations
  • Fragile, sickle-shaped cells deliver less oxygen to the body’s tissues
  • Get stuck more easily in small blood vessels; break into pieces that interrupt healthy blood flow
  • Symptoms: shortness of breath; infections (bone, gall bladder etc); joint pain
  • Causes:
    o Mutation of β-globin gene at AA position 6 (HbS) - GAG → GTG: Glutamic acid → Valine
    o Only individuals homozygous for allele (T:T genotype) have sickle cell anaemia
    o Autosomal recessive mutation
54
Q

Alzheimer’s Disease

A
  • Early onset familial
    o Hereditary; ~40yo
    o V. rare ~5% of all cases
    o Caused by mutation in amyloid precursor protein (APP) or presenilin-1 (PS1)
  • Sporadic late onset
    o ~70yo
    o Associated w/many genes → e.g. Alipoprotein E (ApoE)
  • ApoE contains 2 SNPs resulting in 3 possible alleles for the gene: E2, E3, E4
    o Protein product of each gene differs by one amino acid
    o E3 no effect regarding Alzheimer’s
    o 1x E4 allele → greater chance of developing Alzheimer’s
    o 1x E2 allele → person is less likely to develop Alzheimer’s
    o 2x E4 alleles → may never develop Alzheimer’s
    o 2x E2 alleles → may develop Alzheimer’s
55
Q

Non-coding SNPs

A
  • Studies report: disease associated SNPs are enriched in regulatory DNA regions
    o Enhancers (~+1 million enhancers in human genome)
    o Silencers
    o Locus control regions
    o Promoters
    o Long non-coding RNAs maintaining higher order structure of 3D genome
  • 98% of T2 diabetes associated SNPs were non-coding
  • Changing 1 nucleotide in motif → Tf does not bind anymore → enhancer is not activated
56
Q

SNPs disrupt splice sites

A
  • Causes ~10% of all mutations causing human inherited disease
  • Splice sites found in proximity to exon → provide signal for proteins to cut RNA
  • If SNP in splice site → cannot cut anymore
  • SNP most likely causes total loss of associated exon; or introduces cryptic splice site
  • OAS1 gene –associated w/T1 diabetes
  • Can have synonymous mutation in coding DNA but affects splicing machinery
57
Q

Insertion or deletion

A
  • Can cause:
    o Disrupted start codon
    o Disrupted stop codon
    o Disrupted splice site
    o Frame shift
  • In a frameshift mutation, base is inserted/deleted, altering codon in which insertion or deletion took place, but also changing the reading frame so that all codons downstream are read out of frame → produces string of amino acid substitutions before a stop codon is reached (stop codons are frequent in coding sequences read out of frame)
58
Q

Genome Wide Association Studies (GWAS)

A
  • GWAS = If 500 people with the same disease all share a half dozen SNPs in common, but a group of 500 healthy people don’t share those SNPs, the mutations behind the disease is probably around those SNPs (now more like 10,000 people)
    o GWAS has identified genetic variations that contribute to risk of: T2 diabetes, Parkinson’s, Heart disorders, Obesity, Crohn’s disease, Prostate cancer…
  • GWAS is a very strong area of research → look at common SNPs → statistics
  • The difficulty is accurately identifying the SNPs
59
Q

100,000 Genomes Project

A
  • Genomics England project in collaboration w/NHS
  • Aim: to sequence the genomes from approximately 70,000 people
    o Longer term aim → research on new & more effective treatments.
  • Participants are NHS patients w/cancer or rare disease
  • Genomes of families of patients w/rare diseases also sequenced → identify variants associated w/different conditions
  • Objective: to create a new genomic medicine service for the NHS
  • Patients may be offered a diagnosis where this wasn’t possible beforelonger term aim is research on new and more effective treatments.
60
Q

Identifying Variants

A
  • Sequence multiple genomes from a population at low coverage and pool the data
  • Align to reference and identify variants
  • Pooling works as most of the genome will be the same; some individuals will also share variants
  • Variant prediction software identifies which variants are real and which sequencing errors
61
Q

Limitations of GWAS

A
  • Mostly white people genomes (UK or US) → no diversity → no representative pool
  • Not easy to do functional validation
    o SNP may have effect in one cell type not in the other
  • Coding mutations alter AA sequence of protein → effect is clear
    o Non-coding mutations disrupt regulatory elements are less clear
  • Regulation of gene expression is dependent on multiple factors:
    o Cell-type
    o Temporal patterns such as circadian clock
    o Cell-tissue development
  • GWAS identifies candidate SNPs but confirmation requires additional work
  • Biggest limitation: linkage disequilibrium and GWAS
    o Linkage disequilibrium = the association of alleles at two or more loci within a population → haplotypes don’t occur at expected frequencies - not random
    o Can be used to improve genetic association studies, such as cancer
    o Enables identification of genetic markers for the associated disease
    o E..g 6 SNPs found in people w/Alzheimer’s → 3 found in linkage disequilibrium → cannot figure out which SNP is causative of disease
62
Q

Expression Quantitative Trait Loci (eQTL)

A
  • First DNA sequencing → then RNA sequencing to quantify gene expression → involves mapping variants which alter gene expression
  • eQTL = non-coding SNPs known to affect expression of specific gene ; variants associate w/RNA levels
  • eQTL mapping enables identification of regulated genes → unlikely to be close to disease associated SNP
  • Cis eQTL = affect expression of nearby gene
  • Trans = does not map close to gene; could be other chromosome
63
Q

GWAS and eQTL

A
  • 1000 people → get DNA → find SNPs → get RNA → quantify gene expression → leads to identification of disease causing genes
64
Q

SNP genotyping

A
  • To only find out if pre-defined SNPs are present:
    o Microarrays w/probes for specific SNP → DNA only binds to probe if there is SNP
65
Q

Epigenetics

A
  • Epigenetics = ‘above’ genetics → external modifications to chromatin that turn genes on/off
  • Modifications do not change DNA sequence → they affect how cells read genes
  • Epigenetic changes alter physical structure of DNA
  • E.g. DNA and histone methylation
  • Epigenetic modifications can be inherited → “An epigenetic system should be heritable, self-perpetuating, and reversible (Bonasio et al. - Science 29 October 2010: 612-616)”
  • Depends on environment, diet, smoking
66
Q

Nucleosomes

A
  • Genome is condensed and compacted into nucleosomes
  • 146bp of DNA wrapped around histone octamer (8 → 2x H2A, H2B, H3, H4)
  • Space between nucleosomes = linker DNA
  • Nucleosomes disassemble then reform during replication
  • Nucleosomes = repeating units of chromatin
  • Interaction DNA-histones = sequence independent (H-bonding & ionic interactions w/sugar-phosphate backbone)
67
Q

H1

A
  • Sits outside each nucleosome
  • Structural function to keep nucleosome together
68
Q

Histones

A
  • Can be methylated and acetylated (the tail of the spheres) histone modifications
  • Covalently modified
69
Q

Euchromatin vs heterochromatin

A
70
Q

2 main ways to repress chromatin

A
  1. Constitutive heterochromatin
    - H3K9me2/me3
    - H3 = histone; K9 = lysine 9; di- or tri- methylation
    - Older → weaker methylation of histones → heterochromatic regions get aberrantly activated e.g in Alzheimer’s
    - Permament
  2. Facultative heterochromatin
    - Regions can turn on/off when necessary (e.g. gene promoters that don’t need to be active at all times i.e. developmental genes)
    - H3K27me2/me3 → methylation of Lys27 by H3, tri- or di- methylation
    - Non-permanent → deposited by PRC2
    * Need heterochromatin → otherwise DNA too big
    * Histone modifications repress transposons
71
Q

PCR2

A
  • EZH2 = catalytic subunit (enzyme)
  • Other subunits in complex: EED + SUZ12
72
Q

Typical signature of active enhancers

A
  • Always acetylated in Lys27 of H3; 1x methylated in Lys4
  • Sometimes methylation is a marker of activity
  • Signature used by Tf to understand where to go; if chromatin is active or not
  • Active promoters have trimethylation of H3 Lys4
73
Q

Acetylation and methylation enzymes

A
  • Histone acetylases and deacetylases
  • Methyltransferases and demethylases
  • If these are impaired → histone code is aberrant → have inappropriate labels → e.g. gene that should be repressed is labelled w/active markers → leads to disease e.g cancer
74
Q

Epigenetics and Cancer

A
  • Epigenetic change that silences tumour suppressor gene → lead to uncontrolled cellular growth
  • Turn off genes that help repair damaged DNA → lead to increase in DNA damage → cancer risk
  • Prostate cancer associated w/gene silencing by CpG island hypermethylation within promoter region of GSTP1 gene
  • If issue upstream in epigenetics → leads to cascade of problems
75
Q

X inactivation

A
  • Example of epigenetics
  • Marsupials: paternal X chromosome always silenced
  • Tortoiseshell cat:
    o All female
    o Black/orange alleles of fur coloration are in X chromosome
    o If heterozygous →resulting colour depends on which X is inactivated
    o Tortoiseshell pattern (phenotype) determined by X inactivation
76
Q

Agouti Mice

A
  • Genetically identical mice
  • Mother 1 →skinny-brown mouse→ methyl-rich diet → methylated agouti gene repressed
  • Mother 2 →obese, yellow mouse prone to diabetes/cancer→ unmeth. agouti gene expressed
  • Agouti gene common to all mammals
77
Q

Epigenetics and Twins

A
  • Identical twins = identical genes → differences due to epigenetic changes → lead to different disease susceptibility for example
  • Label different histone modifications w/fluorescent probes → chromosome pairs in twins digitally superimposed →one tag red other green  overlapping shown as yellow
78
Q

Difficulty identifying inheritance

A
  • Histone modifications can be inherited  environmental factors impact it (e.g. smoking mother)
  • Changes in epigenome inherited due to mechanisms (still under study) that allow cells of offspring to remember epigenome of parents
  • Complicated to understand where inheritance is coming from → always present in family? → something environmentally driven?
  • Need to show that epigenetic effect can pass through enough generations to rule out possibility of direct exposure → potentially 3 generations at once exposed to same environmental conditions → prove epigenetic inheritance requires epigenetic change in 4th generation
  • WW2 in Netherlands → extreme lack of food  diet poor in methyl groups → kids born now show these effects → e.g. hypomethylation of IGF2 (insulin-like growth factor) involved in diabetes and cardiovascular diseases
79
Q

Erasing of methylation

A
  • DNA of human sperm is highly methylated; eggs less
  • Egg fertilized → methylation/acetylation in chromatin largely erased esp. from paternal genome
  • As embryo develops, methylation marks continue to be lost from maternal genome up to blastocyst stage
  • Not all methylation erased, hence inheritance possible
  • After this → cells differentiate → DNA becomes methylated again for specialized cell types
80
Q

How is methylation inherited

A
  • Methylation patterns normally erased in primordial germ cells
  • Methylation marks converted to hydroxymethylation then progressively diluted as cells divide
    o V. efficient mechanism → resets methylation patterns of genes for each generation
  • Research found: some rare methylation can escape this reprogramming process → can be passed on to offspring → enables inheritance of epigenetic traits
81
Q

Imprinting

A
  • Most genes inherit 2 working alleles from mother & father
  • Imprinted genes inherit only one working copy → other is epigenetically silenced by addition of methyl groups → repressed forever but reset during egg/sperm formation
  • Expression of gene comes from only 1 allele → less protein or less variation
  • Almost all imprinted genes associated w/growth
82
Q

Genetic conflict hypothesis

A
  • Theory of why imprinting happens
  • Males multiple offspring w/multiple partners simultaneously at low cost
  • Females only one set of offspring at a time at high cost of personal resources
  • Many imprinted genes involved in growth and metabolism
  • Paternal imprinting favours production of larger offspring; maternal imprinting favours smaller offspring
83
Q

Epigenetics in plants

A
  • Plants depend on epigenetic processes for proper function
  • Flowering controlled by set of genes affected by environmental conditions through alteration in expression pattern → ensures production of flowers even when plants are growing under adverse conditions
  • Epigenetic modifications include DNA methylation, histone modifications, and production of micro RNAs (miRNAs)