Sudbery Flashcards
Why do we care about transcriptomics?
- 98.5% of protein coding seq is the same human to mouse
- 1-2% of the genome is coding (hence why we are so diff from mice)
- metazoan genomes are not selected for size → much is repetitive seq for decaying pseudogenes
- every cell has the same DNA, but cells are diff –> dep on what genes are active/exp levels
What did ENCODE find about ‘junk’ DNA?
- most of what was thought to be junk has a function in controlling something
How much of the genome did ENCODE claim was functional?
- 80% (CRMs)
What are cis-regulatory modules?
- inc promoters, enhancers, silencers and insulators
- regions of DNA that bind DNA BPs (eg. TFs) and reg gene exp
What seq motifs do DNA BPs bind, and what is the result of this?
- bind degenerate sequence motifs
- binding sites vary, but certain seqs are more likely
- but just because seq is present doesn’t mean it will bind
- eg. 8 mil GATA1 binding sites in the genome, only 0.2% bound by GATA1 (ChIP-seq)
Does all DNA exist as heterochromatin or euchromatin?
- no, sliding scale
What is hetero and euchromatin?
- heterochromatin = tightly packed
- euchromatin = loosely packed
What regions tend to be nucleosome free, or have v few nucleosomes?
- CRMs
How tightly packed is chromatin in transcribing genes?
- intermediate
How can nucleosome free regions of genome be mapped?
- map w/ DNase-seq
- DNase only cuts where there are no nucleosomes, can use to build up genome wide pic of where nucleosome free regions are
What did ENCODE measure and how?
- RNA expression –> RNA-seq, CAGE-seq and RNA-PET
- DNA/protein interactions –> ChIP-seq
- chromatin accessibility –> DNase-seq and FAIRE-seq
- 3D structure –> ChIA-PET and 5C
- methylation –> RRBS
How does ChIP-seq work?
- prots bind to DNA, and use crosslinking to see where binds to DNA, chop up DNA and use Ab to select for DNA which is crosslinked to a prot, so can separate this DNA, seq it and work out where in genome prot binds
What assays were carried out on what cells in ENCODE?
- tier 1 = all assays
- tier 2 = a selected subset of assays
- tier 3 = everything else, eg. a specific assay or combination
What did ENCODE prod?
- lots of data sets and continues to gen new data
What did ENCODE claim?
- vast majority (80.4%) of human genome participates in at least 1 biochemical RNA and/or chromatin-assoc event in at least 1 cell type
- 19.4% covered by at least 1 DHS or TF ChIP-seq peak across all cell lines
What was wrong w/ ENCODEs claim that 80% of the genome is functional?
- 100% of genome participates in replication
- about 60% of this 80% is transcription and about half of this is introns (which are NOT coding, and these are much bigger than exons so account for a signif proportion of the genome), and we would not necessarily say introns have a function
Was ENCODEs claim that 19.4% of the genome covered by at least 1 DHS or TF ChIP-seq peak more sensible, why?
- yes
- assuming half the elements from TF and cell-type diversity sampled, could estimate a min of 20% of genome participates in these specific functions, w/ the likely figure signif higher
What is the question at the debate over ENCODEs finding?
- what do we mean by functional?
What are the definitions of function?
- causal role = a seq has a function if seq causes the function
- selected effect = a seq has a function is that seq exists because of this function
- also the genetic role = a seq has a function if it is req for that function (doesn’t have to be visible to natural selection)
How is biological function created?
- evolution and selection for that function
What definition of function did ENCODE use, and what is the problem w/ this?
- causal definition
- but surely if a seq is important it will be selected for and if it has no effect on function then it is not important as far as evolution is concerned
- but ENCODE estimates of functionality were way above the amount of DNA known to be selected (at time best estimate was around 5% under -ve selection)
How can the amount of the genome under selection be investigated?
- compare seqs from 2 species and find regions w/ fewer diffs than expected
What is the problem w/ comparing seqs of 2 species to see how much of the genome is under selection?
- what is expected?
- if distant species used, only find function conserved across long time
- if close species used, not enough mutations to find conserved regions
How can problems in investigating how much of the genome is under selection by comparing seqs be overcome?
- use indels –> compare 2 species, look at the gaps in alignment, are there regions w/ fewer gaps than expected?
- had to work out how much seq conserved now in humans, can extrapolate this to plot human mouse, human horse, human chimp etc., then find fit line and extrapolate to 0
If just look at -ve selection to see how much of the genome is under selection, then what’s missing?
- +ve selection
- non coding seqs
- compensatory evolution (applies to non-coding seq)
How does +ve selection affect genome selection?
- seq changes because of new function
- coding seq: dN/dS (comp synonymous to nonsynonymous changes and if lots of nonsynonymous then prob selecting for new function)
How can selection of noncoding seqs in genome be investigated?
- intra-species diversity comp to inter-species diversity
- using 1000 genomes, approx 4%
What is the effect of compensatory evolution on genome selection?
- TF sites control exp of gene, if lose a binding site then fitness reduced from 1 to eg. 0.8 (80% exp)
- isn’t fatal so allows time to get another mutation to gain binding site and fitness again increased to 1
- over years of evolution get many diff seqs that perform same function and give same fitness, but look quite diff
What evidence if there for compensatory evolution?
- mainly anecdotal, eg. from fly embryos
- systematic evidence –> took 4 species of yeast, counted amount of TF binding in regulatory regions of genes, and looked at how much seq changed between 2 species, T is like evolutionary time, get much more changes in seq than binding energy, so conservation of function w/o conservation of sequence
Are ENCODE elements conserved?
- some signal of selection, but quite weak
- melanocyte DHSs are depleted in somatic mutations in whole cancer genomes –> didn’t find somatic mutations in ENCODE elements
- cancer function = unselected function?
- so cancer needs seq but body doesn’t, does this make it functional?
What are some eg.s of function w/o conservation?
- eye colour genes –> not under selection, but is genetic
- disease causing mutations such as AD –> no evolutionary advantage to stopping these mutations, as affects after reproductive age
What did an experiment looking at enhancer and promoter evolution do and find?
- experiment took livers from 20 mammals and mapped where enhancers and promoters are
- promoters generally in conserved location
- enhancers move between species, not v conserved between species
- are they under seq constraint?
promoters and enhancers have some conservation but much less than exons
What evidence is there for function of promoters and enhancers?
- 98% of DHSs are linked to a promoter in ChIA-pet experiments
- genes closer to predicted enhancers tend to have higher expression levels in correct cell type
- ENCODE tested a no. of elements in enhancer reporter assays “over half of the elements showing activity, often in the corresponding tissue type”
- 65% of predicted human heart enhancers drove heart expression in mice
What is the significance of evidence for function os enhancers and promoters being TF binding sites?
- does not imply TF binding
- does not imply enhancer state
- does not imply contact w/ promoter
- does not imply regulation of a promoter
- does not imply phenotypic consequence for the cell
- does not imply phenotypic consequence for the organism