Genomics Flashcards

Question

Transcription factors

Answer 1

* Up- or down-regulate a gene * Recognise specific DNA motifs ~5,6,7 sequences * Use several mechanisms to regulate gene expression: o Stabilising or blocking the binding of RNA polymerase to DNA o Recruit coactivator or corepressor proteins to the transcription factor DNA complex o Catalyse acetylation/deacetylation of histone proteins: i) Histone acyltransferase (HAT) activity –weakens association of DNA w/histones making DNA more accessible to transcription (up-reg) ii) Histone deacetylase (HDAC) activity –strengthens association of DNA w/histones making DNA less accessible to transcription (down-reg) * ~95% of Tf can only bind motifs when chromatin is in active methylated state (non-condensed)

Answer 2

* Actual sequence of particular Tf binding site →variable → harder to identify → so described by general motif not fixed sequence * E.g. Neurod family recognise small sequences; HOX recognise dimers * P53 recognises dimers → most important tumour suppressor gene in mammalian cells → if mutation → cancer * Very specific motifs; even single mismatch would affect function

Answer 3

* Able to go to region of inactive repressed chromatin → bind it → signal for active machinery to arrive

Answer 4

* Chromatin immunoprecipitation followed by sequencing * Used to identify binding regions if binding protein is known * Based on antibodies * ChIP-Seq directly sequences the bound DNA → can then be mapped back onto genome for precise localization * Consensus/variability of binding sites can be determined from sequence: o Map reads back to reference genome o Most frequently sequenced fragments form coverage peaks at specific locations * Same approach as used for DNA methylation

Answer 5

1. Chromatin in nucleus is cross-linked (Tf cannot detach) → fragment it 2. DNA fragments include those w/target protein bound → incubate w/antibody specific for Tf of interest 3. Immune precipitation → everything bound to Tf will precipitate (normally use magnetic beads that recognise the antibody to keep fragments bound to Tf stuck in tube) 4. Reverse cross-linking 5. Use specific proteases to degrade Tf → have DNA only 6. Sequence using next-gen sequencing 7. Mapped back onto genome → reads only map in regions where Tf was binding

Answer 6

* Also use ChIP-Seq * Fractionate DNA → use antibody that binds to modified histone being studied → separated by immunoprecipitation →sequence isolated fraction using next gen seq → map sequence back onto genome to identify regions w/modified histones ie. genes under regulation

Answer 7

* Encyclopaedia of DNA Elements * 400 scientists involved * 5-year project completed in 2012 → identify all functional elements in human genome sequence o Goal: to identify and characterise everything in the genome that is non-coding →the DNA/RNA regions regulated and the factors regulating them * 2003- human genome was sequenced → now have a reference genome allowing development of this project

Answer 8

* Looked at how many different genes are expressed in different cell types * Profiled binding of Tf across ~56 different cell types using ChIP-Seq * Profile DNA methylation across ~56 different cell types * Looked at chromatin conformation o Promoters & enhancers need to be in contact w/each other → contacts can be profiled by different techniques → look at 3D genome & chromatin * Profile all accessible regions of chromatin (euchromatin state) o Euchromatin = less condensed, gene-rich, more easily transcribed; nucleosomes are depleted; DNA is accessible for binding of Tf o Heterochromatin = highly condensed, gene poor, transcriptionally silent

Answer 9

* Regulating gene expression → complex → involves many regulatory factors → Tf, enhancers, silencers, methylation patterns * Transcripts regulated by splicing factors e.g. exon and intron splicing enhancers and silencers * Functionality often cell/tissue and time specific

Answer 10

* Hypersensitive sites = regions of chromatin highly sensitive to DNase 1 o Nucleose→ less compact→ enables DNA to bind to proteins e.g. Tf * DNase-Seq = technique where DNase 1 cuts DNA only when accessible → isolating fragments → amplify through next gen seq → add adaptors w/ligation → cluster into flow cell→ cluster → reads mapped onto genome → they pile up in regions where chromatin was open and accessible * Mapping HS sites → identify location of genetic regulatory elements e.g. promoters, enhancers, silencers, locus control regions →(ChIP-Seq can only identify Tfs)

Answer 11

* Open chromatin regions sequenced and mapped to reference genome * Nuclear extraction → DNase 1 digestion → library preparation → PCR amplification → high-throughput sequencing

Answer 12

* Number of fragments that map to a sequence is a measure of regulatory activity * Sites bound by some Tfs show highly specific patterns of DNase I cleavage = 'DNase footprints’ * Footprints used to identify binding of specific Tfs; advantage over ChIP-seq o w/ChIP_seq need to know the transcription factor for immunoprecipitation o DNase-seq identifies the binding sites de novo * DNAse-seq is a genome wide version of DNA footprinting method * Footprint = prediction of what Tf could bind

Answer 13

* Similar method to DNase-Seq + addition of formaldehyde for cross-linking o More efficient in nucleosome-bound DNA than in nucleosome-depleted regions of genome * Phenol chloroform extraction of DNA → treatment to isolate nucleic acid from solutions o Cross-linked chromatin will go to bottom of tube (organic phase) o Condensed chromatin at bottom; active chromatin at top of tube o These increase resolution and reduce noise * DNA extracted and mapped to reference genome to identify open DNA regions * FAIRE-Seq higher coverage at enhancer regions over promoter regions * DNase-Seq higher sensitivity towards promoter regions

Answer 14

* Uses mutated hyperactive transposase Tn5 instead of DNase1 * Tn5 enzyme derived from transposons → attacks and chews open active euchromatin efficienctly * DNA fragments then isolated, sequenced, and mapped * Advantages: o Requires smaller sample than DNase-seq and FAIRE-seq (requires 1000x more cells) o V. fast → completed in 3 hours * Disadvantages: o V. expensive because monopoly

Answer 15

* Uses micrococcal nuclease * Cuts v. near to nucleosomes

Answer 16

* ChIP-seq requires antibody o Profiling something specific e.g. one Tf of interest * All others don’t need antibody o Less specific → profile all active chromatin regions

Answer 17

* Within each chromosome TAD (Transcriptional active domain)- portions of chromosome isolated from each other o One TAD does not interact w/other TADs → enhancers/promoters can only regulate genes in same TAD (v. few expection) * Lowest possible level = looping * Knowing interactions is essential for understanding mechanisms of gene regulation in health and disease * All possible interactions can be profiled by different techniques (e.g ChIA-PET, 3C, 4C, HiSEQ, etc) o All based on same approach w/one variation o Use restriction enzymes: fragmenting genome → religation→ profile what is ligated * ChIA-PET → studies genome wide long range chromatin interactions involving protein factors o Involves additional step: antibody precipitation → to identify chromatin interactions that are regulated by a specific transcription factor, between distal and proximal regulatory sites and their associated promoters * Diff-linker, PETs, used to identify non-specific ligation noise; identify ligations between different ChIP complexes

Answer 18

* V. accurate; can be reproduced * Methods similar to DNA-protein interaction identification * RIP-seq involves immunoprecipitation of RNA-binding protein (RBP) of interest → has to be done non-stringently o Low stringency = low specificity * Developed to CLIP-seq → includes cross-linking step using UV light (irreversible) * Final steps: o Digestion with proteinase K leaving peptide at binding site that modifies nucleotides to create cross-linked induced mutation sites (CIMS) o Reverse transcription to make cDNA → identify RNA-binding sites o Sequenced and mapped to transcript

Answer 19

* PAR-CLIP = improves crosslinking w/photoreactive RNA nucleotides * iCLIP = uses reverse transcriptase stalling to map individual nucleotide-protein interactions * miCLIP = modifies RNA methylase to map its binding sites

Answer 20

* All ‘seq’ methods use similar approaches and same final step to identify regions of interest (protein binding site, open chromatin, etc) * Method: o Isolate sequence e.g. fragmentation and immunoprecipitation, phenol/chloroform etc o Sequence fragment and map back to genome

Answer 21

* Most of the genome is "functional" → controversial statement from ENCODE o ENCODE considers anything transcribed must be functional → but many transcripts are non-functional e.g. pseudogenes * ENCODE emphasized sensitivity over specificity → lead to false positives * Criticism: arbitrary choice of cell lines and transcription factors; lack of appropriate control experiments

Answer 22

* Conducted in immortalised cell lines (derived from human cancers → v. easy to manipulate but v. unstable) o Want something closer to real healthy cells o E.g HeLa cells sometimes have 3/4 pairs of chromosomes → not diploid * Roadmap consortium o Profiled histone modifications across 25 human primary tissues (mark different chromatin states)

Answer 23

* DNA sequence variations → occur when single nucleotide (A, T, C, or G) in genome sequence is altered; must occur in at least 1% of the population * SNPs make ~90% of all human genetic variation; occur approx every ~1000 bases; ~4-5 million SNPs in an individual human genome

Answer 24

* Can affect how humans develop diseases * Can affect how an individual respond to pathogens * Can affect how an individual respond to chemicals * Can affect how an individual respond to drugs, etc * Potentially their greatest importance in biomedical research is for comparing regions of the genome between cohorts * Comparing cohorts with and without a disease – GWAS

Answer 25

* Intergenic region → possibly transcription enhancer/regulatory region * Within promoter or transcription factor binding region * Within exon → Could affect protein coding * Within intron → Possibly regulatory region e.g. affecting splicing

Answer 26

* Can be v. dangerous or neutral * SNPs may be direct cause of disease or signal for increased likelihood of disease * Disease associated SNPs: o Monogenic → one nucleotide change leads to disease; relatively easy to detect/analyze; simple traits o Polygenic → many nucleotide changes affect probability of disease; hard to detect/analyze; complex traits

Answer 27

* Coding SNPs = potentially disease causing as they can affect the protein * Types: Synonymous (silent) & Non- synonymous * Synonymous mutation: change base but AA is the same o May still affect Exon Splicing Enhancers (ESE) or Exon Splicing Silencers (ESS) site so cannot always be ignored * Non-synonymous – change in base changes AA → mutation could be detrimental

Answer 28

* Transition (Ti) - most common substitution o Replacing purine by purine i.e. A → G or pyrimidine by pyrimidine i.e. T → C * Transversion (Tv) - less common o Replacing purine by pyrimidine or vice versa i.e. A → C * Ti/Tv ratio – varies within genome; used to assess GWAS data quality o Across entire genome averages around 2 o In protein coding regions typically higher, often above 3 due to transversions in third base of codon being more likely to change the encoded amino acid

Answer 29

* Inherited blood disorder due to mutations in beta globin HBB * Found primarily in African and related populations * Fragile, sickle-shaped cells deliver less oxygen to the body's tissues * Get stuck more easily in small blood vessels; break into pieces that interrupt healthy blood flow * Symptoms: shortness of breath; infections (bone, gall bladder etc); joint pain * Causes: o Mutation of β-globin gene at AA position 6 (HbS) - GAG → GTG: Glutamic acid → Valine o Only individuals homozygous for allele (T:T genotype) have sickle cell anaemia o Autosomal recessive mutation

Answer 30

* Early onset familial o Hereditary; ~40yo o V. rare ~5% of all cases o Caused by mutation in amyloid precursor protein (APP) or presenilin-1 (PS1) * Sporadic late onset o ~70yo o Associated w/many genes → e.g. Alipoprotein E (ApoE) * ApoE contains 2 SNPs resulting in 3 possible alleles for the gene: E2, E3, E4 o Protein product of each gene differs by one amino acid o E3 no effect regarding Alzheimer’s o 1x E4 allele → greater chance of developing Alzheimer's o 1x E2 allele → person is less likely to develop Alzheimer's o 2x E4 alleles → may never develop Alzheimer's o 2x E2 alleles → may develop Alzheimer's

Answer 31

* Studies report: disease associated SNPs are enriched in regulatory DNA regions o Enhancers (~+1 million enhancers in human genome) o Silencers o Locus control regions o Promoters o Long non-coding RNAs maintaining higher order structure of 3D genome * 98% of T2 diabetes associated SNPs were non-coding * Changing 1 nucleotide in motif → Tf does not bind anymore → enhancer is not activated

Answer 32

* Causes ~10% of all mutations causing human inherited disease * Splice sites found in proximity to exon → provide signal for proteins to cut RNA * If SNP in splice site → cannot cut anymore * SNP most likely causes total loss of associated exon; or introduces cryptic splice site * OAS1 gene –associated w/T1 diabetes * Can have synonymous mutation in coding DNA but affects splicing machinery

Answer 33

* Can cause: o Disrupted start codon o Disrupted stop codon o Disrupted splice site o Frame shift * In a frameshift mutation, base is inserted/deleted, altering codon in which insertion or deletion took place, but also changing the reading frame so that all codons downstream are read out of frame → produces string of amino acid substitutions before a stop codon is reached (stop codons are frequent in coding sequences read out of frame)

Answer 34

* GWAS = If 500 people with the same disease all share a half dozen SNPs in common, but a group of 500 healthy people don’t share those SNPs, the mutations behind the disease is probably around those SNPs (now more like 10,000 people) o GWAS has identified genetic variations that contribute to risk of: T2 diabetes, Parkinson's, Heart disorders, Obesity, Crohn's disease, Prostate cancer… * GWAS is a very strong area of research → look at common SNPs → statistics * The difficulty is accurately identifying the SNPs

Answer 35

* Genomics England project in collaboration w/NHS * Aim: to sequence the genomes from approximately 70,000 people o Longer term aim → research on new & more effective treatments. * Participants are NHS patients w/cancer or rare disease * Genomes of families of patients w/rare diseases also sequenced → identify variants associated w/different conditions * Objective: to create a new genomic medicine service for the NHS * Patients may be offered a diagnosis where this wasn’t possible beforelonger term aim is research on new and more effective treatments.

Answer 36

* Sequence multiple genomes from a population at low coverage and pool the data * Align to reference and identify variants * Pooling works as most of the genome will be the same; some individuals will also share variants * Variant prediction software identifies which variants are real and which sequencing errors

Answer 37

* Mostly white people genomes (UK or US) → no diversity → no representative pool * Not easy to do functional validation o SNP may have effect in one cell type not in the other * Coding mutations alter AA sequence of protein → effect is clear o Non-coding mutations disrupt regulatory elements are less clear * Regulation of gene expression is dependent on multiple factors: o Cell-type o Temporal patterns such as circadian clock o Cell-tissue development * GWAS identifies candidate SNPs but confirmation requires additional work * Biggest limitation: linkage disequilibrium and GWAS o Linkage disequilibrium = the association of alleles at two or more loci within a population → haplotypes don’t occur at expected frequencies - not random o Can be used to improve genetic association studies, such as cancer o Enables identification of genetic markers for the associated disease o E..g 6 SNPs found in people w/Alzheimer’s → 3 found in linkage disequilibrium → cannot figure out which SNP is causative of disease

Answer 38

* First DNA sequencing → then RNA sequencing to quantify gene expression → involves mapping variants which alter gene expression * eQTL = non-coding SNPs known to affect expression of specific gene ; variants associate w/RNA levels * eQTL mapping enables identification of regulated genes → unlikely to be close to disease associated SNP * Cis eQTL = affect expression of nearby gene * Trans = does not map close to gene; could be other chromosome

Answer 39

* 1000 people → get DNA → find SNPs → get RNA → quantify gene expression → leads to identification of disease causing genes

Answer 40

* To only find out if pre-defined SNPs are present: o Microarrays w/probes for specific SNP → DNA only binds to probe if there is SNP

Answer 41

* Epigenetics = ‘above’ genetics → external modifications to chromatin that turn genes on/off - Modifications do not change DNA sequence → they affect how cells read genes * Epigenetic changes alter physical structure of DNA - E.g. DNA and histone methylation * Epigenetic modifications can be inherited → “An epigenetic system should be heritable, self-perpetuating, and reversible (Bonasio et al. - Science 29 October 2010: 612-616)” * Depends on environment, diet, smoking

Answer 42

* Genome is condensed and compacted into nucleosomes * 146bp of DNA wrapped around histone octamer (8 → 2x H2A, H2B, H3, H4) * Space between nucleosomes = linker DNA * Nucleosomes disassemble then reform during replication * Nucleosomes = repeating units of chromatin * Interaction DNA-histones = sequence independent (H-bonding & ionic interactions w/sugar-phosphate backbone)

Answer 43

* Sits outside each nucleosome * Structural function to keep nucleosome together

Answer 44

* Can be methylated and acetylated (the tail of the spheres) histone modifications * Covalently modified

Answer 45

1. Constitutive heterochromatin - H3K9me2/me3 - H3 = histone; K9 = lysine 9; di- or tri- methylation - Older → weaker methylation of histones → heterochromatic regions get aberrantly activated e.g in Alzheimer’s - Permament 2. Facultative heterochromatin - Regions can turn on/off when necessary (e.g. gene promoters that don’t need to be active at all times i.e. developmental genes) - H3K27me2/me3 → methylation of Lys27 by H3, tri- or di- methylation - Non-permanent → deposited by PRC2 * Need heterochromatin → otherwise DNA too big * Histone modifications repress transposons

Answer 46

* EZH2 = catalytic subunit (enzyme) * Other subunits in complex: EED + SUZ12

Answer 47

* Always acetylated in Lys27 of H3; 1x methylated in Lys4 * Sometimes methylation is a marker of activity * Signature used by Tf to understand where to go; if chromatin is active or not * Active promoters have trimethylation of H3 Lys4

Answer 48

* Histone acetylases and deacetylases * Methyltransferases and demethylases * If these are impaired → histone code is aberrant → have inappropriate labels → e.g. gene that should be repressed is labelled w/active markers → leads to disease e.g cancer

Answer 49

* Epigenetic change that silences tumour suppressor gene → lead to uncontrolled cellular growth * Turn off genes that help repair damaged DNA → lead to increase in DNA damage → cancer risk * Prostate cancer associated w/gene silencing by CpG island hypermethylation within promoter region of GSTP1 gene * If issue upstream in epigenetics → leads to cascade of problems

Answer 50

* Example of epigenetics * Marsupials: paternal X chromosome always silenced * Tortoiseshell cat: o All female o Black/orange alleles of fur coloration are in X chromosome o If heterozygous →resulting colour depends on which X is inactivated o Tortoiseshell pattern (phenotype) determined by X inactivation

Answer 51

* Genetically identical mice * Mother 1 →skinny-brown mouse→ methyl-rich diet → methylated agouti gene repressed * Mother 2 →obese, yellow mouse prone to diabetes/cancer→ unmeth. agouti gene expressed * Agouti gene common to all mammals

Answer 52

* Identical twins = identical genes → differences due to epigenetic changes → lead to different disease susceptibility for example * Label different histone modifications w/fluorescent probes → chromosome pairs in twins digitally superimposed →one tag red other green  overlapping shown as yellow

Answer 53

* Histone modifications can be inherited  environmental factors impact it (e.g. smoking mother) * Changes in epigenome inherited due to mechanisms (still under study) that allow cells of offspring to remember epigenome of parents * Complicated to understand where inheritance is coming from → always present in family? → something environmentally driven? * Need to show that epigenetic effect can pass through enough generations to rule out possibility of direct exposure → potentially 3 generations at once exposed to same environmental conditions → prove epigenetic inheritance requires epigenetic change in 4th generation * WW2 in Netherlands → extreme lack of food  diet poor in methyl groups → kids born now show these effects → e.g. hypomethylation of IGF2 (insulin-like growth factor) involved in diabetes and cardiovascular diseases

Answer 54

* DNA of human sperm is highly methylated; eggs less * Egg fertilized → methylation/acetylation in chromatin largely erased esp. from paternal genome * As embryo develops, methylation marks continue to be lost from maternal genome up to blastocyst stage * Not all methylation erased, hence inheritance possible * After this → cells differentiate → DNA becomes methylated again for specialized cell types

Answer 55

* Methylation patterns normally erased in primordial germ cells * Methylation marks converted to hydroxymethylation then progressively diluted as cells divide o V. efficient mechanism → resets methylation patterns of genes for each generation * Research found: some rare methylation can escape this reprogramming process → can be passed on to offspring → enables inheritance of epigenetic traits

Answer 56

* Most genes inherit 2 working alleles from mother & father * Imprinted genes inherit only one working copy → other is epigenetically silenced by addition of methyl groups → repressed forever but reset during egg/sperm formation * Expression of gene comes from only 1 allele → less protein or less variation * Almost all imprinted genes associated w/growth

Answer 57

* Theory of why imprinting happens * Males multiple offspring w/multiple partners simultaneously at low cost * Females only one set of offspring at a time at high cost of personal resources * Many imprinted genes involved in growth and metabolism * Paternal imprinting favours production of larger offspring; maternal imprinting favours smaller offspring

Answer 58

* Plants depend on epigenetic processes for proper function * Flowering controlled by set of genes affected by environmental conditions through alteration in expression pattern → ensures production of flowers even when plants are growing under adverse conditions * Epigenetic modifications include DNA methylation, histone modifications, and production of micro RNAs (miRNAs)

Genomics Flashcards

(83 cards)