Lecture #3 (Transcription Factors and Chromatin) Flashcards
Chromatin (Overall)
Chromatin = Things co-associated with DNA
- When you spin down DNA here are proteins and RNA that come with it (weight of proteins = weight of DNA)
Chromatin = DNA + RNA + Histoen proteins + Non-histone proteins
- Chromatin = Nucleosomes + Histone proteins
Two types of proteins that come with DNAT6
- Histones - Form the fundamental units of chromatin organization
- Basic positivley charged proteins
- There are 5 histone proteins - Non-histone proteins
- Acidic proteins
- There are thousands on non-histone proteins (DNA/RNA polymerase, DNA binding proteins etc.)
Have as much histone and non-histone proteins by weight
Epigenome
Really just another term for chromatin
Two components of epigenome:
1. Transcription factors/Regulatory DNA elements
2. Chromatin
Genome + Epigenome
Genome + epigenome = work together to make organisms
Genome + Epigenome = olecular blue print for everything (Ex. Growth + adaptation + differentiation)
- Also affects hormonal signals between organs
There is NOTHING in biology that is not affected by gene expression via epigenome
Mutations in epigenome
Mutations = affect DNA and RNA in epigenome –> Leads to diseases
Have many pathologies related to dyfunction of the epigenome
Nucleosome
Nucleosome = fundemental unit of chromatin
Conservation of Histone proteins
Histone proteins = most conserved proteins on planet
- H4 is the most conserved of the histones
Histones = small –> 100-120 Amino Acids –> H4 in mammal vs. plants has 1 Amino aciid difference (VERY conserved)
- Other histoes have more divergene but stil consevred
Conservation = allows you to study histones in model organsims –> the principles will apply to other organsims inclduing humans evcause have similar proteins
Underlying prinsiples between organisms
Underlying principles of cell and tissue differentiation in Eukaryotes
Example - principals of turning on genes in humans is model organisms is not veyr different bery turning on genes in humans
- Can pick which organism is easiest to study in and traslate to humans
Human Genotype Expression prohect (GTEx)
Mapped expression of gene gene in every cell in human body
Take homes:
1. Human genome has 20,000 protein coding genes
2. Human genome has 10,000-13,000 non-coding RNAs
3. In any cell type only HALF the genome is expressed (a little abive half) –> ONLY 10,000-13,000 of protein coding genes is expressed ; 7,000-10,000 of the protein coding genes are silenced
- Need to make sur ethe right genes are silences or expression( (Different in different cell types)
4. 8,000 of the expressed protein coding genes are core acitivies (ubiquitus) - what every cell needs to survive and replicate
5. 2,000-5,000 of teh PCGs are preferntailly expressed in certain cell types (expressed ebeyrwhere but enhanced in some cells and not others)
6. 200 of the PCGs are tissue specific
- Want to look at the 200 master proteins to look at cell type and cell fate
Ubiqitous genes in genome
House keeping genes = ubiqitous genes = ALWAYS ON
How do they get truned on?
- Example 0 Insulin gene –> HAVE Transcription factors that are specific to be expressed in that cell type
What regulates Eukaryotic Transcription Factors
Eukaryotic trasnscription is regulated by proximal and distal promoters enhancer DNA elements
- Instruction for intiating transcription are near transcripton strat site AND can be far away
Enhancers = located away from the promoters (1kb-1MB)
- Enhancers are genetically validating as transcriptional control elements –> muttaions od enhancers affects gene expression
- Enhancers = can be upstream or downstream of the promoter –> feed and give infomration to trasncrtion start site where RNA polymerase binds
- Enhancers = orientation-independent (doesn’t matter if at 5’ or 3’ end)
- Ehnacers = regulate a target that is far away
- Enhancers = enhance trasmcription
- Enhancers = just as important as promoters
- Can be affected by envirnmnetal signal (give complexity to gene regulation)
Regulatory DNA Elements
Regulatpry DNA elements = short DNA sequences (10-20 BP) –> Elements will be recognized by transcrition factors
- Transcription factor protens recognize 10-20 BP
- More BP in sequence = more specific because fewer probaility those sequence will exist genome wide) –> TF have evoloved to recognize specfic sequences by making them longer (Ex. recognition sites are longer than RE)
MOst Transcription factors recognzie 100-1,000 sites but have some that only recognize 1 site
Heat shock protein Transcription factors
Heat shock proteins = encodes protein chaparones to help proteome not denature at a high temperture –> have a set of proteins and each portein coding –> each protein has an upsteram regualtpry elements (15 nucleotde lement) that is unique that binds to the transcriotion fcator and swicthes on the sets of genes because they have the same sequence on the upstrea regulatory
WATCH VIDEO
Promoters
Promoters = where transcription begins
Promoters and enhancers have TF binding site
Discovery of promoters/ehnacers
Discover promoters and enhancers done by knocking out promoters/enhacers
Can do systemic delation –> detect enhnacer –> see if teh gene stops working or decrease in expression
- Find enhmacers/promoters by knockout
Enhancer DNA elememts
Enahcer = DNA elements that are similar to elemenst at promoter –> bind to TF that may or may not be shared with TF that bind to the promoter (may or may not be same TF)
IN IMAGE - Ehnacer fartheer awya have different TF
Seqwunece specific TF are different from geenratal trascription fcators
- Gentral transction factors = found at the promoter
DNA and protein DNA intercatioons that need to talk to get RNA polymerase to promoter and commucate that it is time to move
Model for enhancer-promoter interaction
Model for how enhancers can work depsite being far away from the promoter = enhancer-promoter intercation by looping faciliated by cohesin ring (physical proximity)
Before - thought the single oozes down the chromatin fibers
NOW - think DNA can bend allowing the enhancer to contact the promoter lock together with cohesin ring
- IF mutate cohesin protein –> destroys the ring strcuture = impairs enhnacer-promoter communication (Doesn’t REALLY prove that the model is right)
- Promoter recruits RNA pilymerase –> Polymerase can lod onto the promoter
STILL DEBATE - people think it could be indirect protein cluster (condestae) betwen the enhancer and promoters instead of direct contact
Evidence that things are DNA elements are close to each other
Evidenece that DNA sequences that are far away from each other linearly are close spatially - Uses chromain cpature
Chromatin cpature - take cells –> cross link DNA using formadheye (cross links teh lysine residues) –> digest uncessary things away; cross linkages keeps the two fragments connected –> ligate DNA (if two things are close then they will ligate) –> Sequence –> Can see DNA is ligated to something that should be far away to know thas omething brought them together to be able to ligate
Issue with Chromatin Conformation Capture
Issue = measuring ligation NOT contact
- Looking at genome that shows TADs (TADs = regions that are ligatable)
In Chart - Peak of the trainge shows ligation –> Means that the places at teh two bottom corners of he trainge (regions far away on chrosmome) are interacting at the top point of traingle
- Interpre this a the DNA sequences are close BUT really they are just ligatable
THe sequences COULD be far apart and still be ligatable –> because they could be brought together by a cytoskeata protein (If thinsg are far away and moving randomly then you get a certain low ligation frequencey BUT of there is a cytosklatal cable between the two then they will be connected and can have a higehr ligation frequnecey
- Connected far apart but still have impored ligation (Flow in HiC)
2nd way to see 3D chromosomal interactions
Use Genome Archtecture Mapping (GAM) –> high resolution sectioning of the cell and part of the nucleus
Microdisect individual slides –> Put slices in well for genome analsyis –> measure co-segregation frequencey of two parts of the genome
GAM often condirems HiC results (main results between GAM and HiC are similar)
GAM = has single cell sensitivity (1 nucleus at a time) Vs. popultion cell in test tube for ligation reaction in HiC
Can detect mltiple interactions in 1 section (Ex. 10 things comthing togetehr) vs. HiC only see intercation between 2 things
Can detect interaction of super ehnacers and actived genes as triplets across Mbp distance (see enhnaer and proxbmity genes)
Issue with GAM
Resolution - issue with how thin you can slice
Have a proxmity limit of 220 nm BUT a nucleosome if 10nm –> Means you could have 22 nucleosomes in each slice
Super enhancers
A subset of human genes are regulated by super-enhancers (Common in pluripotencey genes and oncogenes)
Idea of what a super enhancer is:
1. Super-enhancer is its own thing
- Ex. Cell cylcle assocated genes + tumor supresser genes + oncogemes = regulated by 10-20 kb super enhnacer
- Ehnacer = only a few 100 BP (Super enhnacer is longer)
2. Super-enhancers are clusters of stnadrad enhnacers
What type of genes use superenhnacers
Very important genes use super-enhnacers to regulate genetic ectivity (Ex. oncogenes)
- Super-enhnacer = has signators of chromatin + have RNA pol there + enriched for mediators
Example 2 - Glbulin or insulin genes that are more simple (might not have super enhnacer?)
THINGS that recevive more signla s= need exrtra regulatpru circut = have many things that affect one promoyer = have kb (super-ehnacer) regulating that 1 promoter
- Regulation is not just 1-3 elements BUT it is a large cluster of elments over 10kb (ALL 10 kb is important)
- Ex. pluri=potencey gene = repsonding to a lot of signals = needs a lot of enhnacers
Promters + Enhancer DNA elements
Promoter and Enhancer DNA elements interact with sequence specifc Transcrtion factors to recruit general trnascription fcators + mediators + RNA polymerase 2
Promoter
Promoter = part of the gene where transcrion strats
- Usually only have 1 transcription start site BUT can have multiple (seoerated by 10s of BP)
- Start = where RNA Polymerase is initiated
Image:
- Have nuleosomes at the end (downstream) of promoter
- Have RNA polymerase 2 at start site
Underlying DNA sequence of promoter = 70-90 BP (DNA that includes transcriptoon start site - common feature to eveyr gene)
Complex at the promoter
Form transcrioton factor 2D complex –> recsuites RNA polymerase along with mediator
- 60 proteins involoved
TF that bind to enhancers and to proximal sites birngs RNA pol to promoters with high speficity = have domains that recruit conetrate co-regulators and the mediator (comunciates infomration to get thinsg started)
Core promoter
Includes:
1. TATA box (-30 from start site)
- Most common feature
- TATA protein binds to this
2. INR (-2-+4)
3. DPE (Distal promter element)(+32)
Where general transcription factors and RNA Pol 2 binds
ALL together - core promoter = 60-70 BP)
- Not EVERY core promoter has all of the elements but most do
Core promoter = commanility for ALL 20,000 genes (10,000 ubiqitous genes expressed in every cell have the same core promoter)
What distiguishes between promoters
Transcription fcator binding sites that are upsteram of the core sequences distiguish between promoters
Genral transcription factors are DIFEFRENT from sequence specific transcrtion factors
- general transcription factors recognize the core promoter + do not recognize DNA the same way at the sequence specific TF
Sequence specific transcription factors bind to enhancers + proximal sequnece (upsteram of core promoter)
- Not THAT specifc (binds to 1000s of genes BUT more specific)
- Combination of TF that ind to enhnacer/proximal sites that brings RNA polymerase to promoter with high specificty
Discovery of GTF and SSTF
Discivered by biochemistry and then proven by genetics
Start with biochemsirty using “grind and find” –> use radioactive nucelotode and DNA transcript –> grind cell and see what happens when you put ribonucleatides (does it match protein)
TF cam out of biochem BUT then verify with genetics (mostly with yeats because of saturation)
- Yeast geneticiss confirmed biochemistry findings
Sequence specific transcription factors
Sequence specifc transcrion factors = include master developmental regulators and progarmming fcators
TF can take the differentiated cell (ex. take a fibroblast) and can reprogram the cell to make a different differentated state by chnaging the TF
- Ex. Mylb B = Fibroblast –> muscle OR CLEP takes B cell –> Macrophage OR OSKN factors reprogram cells to pluripoetent stem cells
IF you know how the master genes work = can d a lot of reprodgraming
How many sequence specific TF do humans have?
Humans have 1600 sequence specific transcription factors –> MEANS of the 20,000 genes only 8% if eh genome is dedicated for sequence specifc TF recgining promoter (proximal?) and enhacer elememts
- ONLY 200 genes are tissue specififc (master regulaters) - Ex. firboblast or B cel specfic
Issue with OSKN story (Inducing pluripotent stem cells)
Issue = frequencey of repreogramming is small
In a popultion with 1 million cells <1% is reprogramed
- Measn that reorgnization of the entire epigenome must be complicated
Sequence Specific TF + DNA binding
Sequence specifc TF have DNA binding and other modular fucntion domains
Sequence specific TF = modular protein –> MEANS you can chop up and mix modules
ALL Sequence specifci TF have:
1. DNA binding domain (reading sequence) - recognizes the sequence logos
2. Transcription activation domain (must bring transcription machinery down to the DNA)
IF the TF responds to signals THEN needs a ligand binidng domain (Ex. If respnds to estrogen THEN needs domain that binds to estrogen)
- When the ligand binds its exposes teh DNA binding domain or transactivating domain
The different domains (modules) that amke teh Transcription factor dimerize or trimerize etc –> expands (can recognzie more BP)
- One modile recognzies 5 BP –> trimer has 3 modules = can recognize 15 BP
- Example - Heat shock TF = trimer that recognizes 15 BP but eah module recognizes 5 BP
Doing a selection fir DNA sequence that TF bind to
Logos = DNA sequence that are most often found when a transcrtion factor binds
- Example - SP1 binds to C rich elements
- Logo = binding site of TF
- Sequence specific TF = all look for short DNA sequence
THEN look if the 1600 sequence specifci TF have domain characteristics –> Creates families
Chart - Looking at restricted region of DNA within an enhancer orpromter hat the Sequence specifc TF recognize
Prevelant DNA binding domain
Zinc Finger
- MOst prevelant DNA biding domain (engages with DNA promoter)
- TF can have multiple zinc fingers (Ex. CTCF has 11 zinc fingers) ; proteins that binds to 1 site on genome have many zinc fingers
- Zinc fingers can also read RNA and protein
- Longers BP recogniztion sequnce = more unisque so it can be better recgnized by zinc fingers
Cystine-Histodnine that cooredenates zine ion
- zinc finger is coordinated by zinc
Zinc fineger reaches to the major groove of DNA and recognized by AMno acid side chains and DNA bases = read the bases (HOW it reachs DNA code)
60% of the 1600 Seqeunec sepcifc TF are zinc finger proteins
Transcription fcator transactvating domains
Trnascription factor binds to promoters and enhnacers –> the transactivating domain on the TF willl recruit compoenents of regulatory machinery or transcription machinery
Structured motifs of TF
There are a limited number of structured motifs that Transcription fcators use to engange DNA promoters
C2H2 (Cytosine-Histodine) zinc finger = most domient THEN heleix turn helix then helix loop helix (Zipper domain)
ALSO have Zip B Zip = Lucine Zipper DNA binding domain
DNA binding domains of TF
DNA binding domains include:
1. Zinc Finger
2. bZip - 2 alpha heliceies (each helix goes into major groove on opposite sides)
- Ex. GCN in yeast or AP1 in humans
- Long helix intecat with another long helix thorugh coil coil inreractions –> presents to DNA binding domwinas –> reaches to major groove of target
3. bHLH
4. HTH families (helix turn helix –> helix goes to the major grove)
IMage - different colros = different recogniztion helix –> goes into major helix and reads the bases (have different orientation of the recognition helix in major groove)
- Helix reads the DNA by inserting into he major groove at different orientations
- Main element of specificty = to read sequence on major groove
Zinc Finger proteins (depth)
Zinc finger proteins = comrpise multiple copies of a small beta-beta-alpha domain stabilized by a zinc ataom
- Zinc cordinates finger
Zinc finger protein = Type of TF that has zinc fingers (I THINK)
Each molecule in he protein function independentley and recognzies 3-4 BP on DNA
Select residues in each domain are repsonsible for intetacting with DNA (rest provide a structual scafold)
Read DNA sequence depend on which side chain is used (fingers insert to major groove and read off different DNA bases )