Week 1.2 The ENCODE Project Flashcards
What does ENCODE stand for?
The Encyclopedia of DNA Elements
What is the ENCODE project?
The project wants to work out how much of the genome is functional and what it does. 2003 Pilot study launched and main project launched in 2007 looked at the whole genome. Major publication in 2012 large international public consortium, the papers were published in 30 papers in Sept 2012.
What did the ENCODE project claim?
Claims to assign biochemical function for 80% of the human genome. Much non-coding DNA is involved in the regulation of the expression of coding genes.
ENCODE by the numbers
440 researchers 32 research groups $123 million cost 5 countries: United States, United Kingdom, Spain, Singapore and Japan 1600 sets of experiments 147 types of tissues 15 trillion bytes of raw data 11,972 files analysed 17,000 Promoter regions discovered 4 Million human regulatory regions mapped discovered
What was experimental method 1?
Main experimental methods -1
RNA-seq. Isolation of RNA sequences from different cell and tissue types, used different purification techniques to isolate different fractions of RNA followed by high-throughput sequencing.
CAGE (Cap analysis gene expression). Capture of the methylated cap at the 5′ end of RNA, followed by high-throughput sequencing of a small tag adjacent to the 5′-methylated caps. Locates transcription start sites.
RNA-PET (RNA-paired end tags). Simultaneous capture of RNAs with both a 5′ methyl cap and a poly(A) tail, which is indicative of a full-length mRNA. This is then followed by sequencing a short tag from each end by high-throughput sequencing. Demarcates the boundaries of transcription units. Thus they can see the start and finish of mRNA, much cheaper and easier thus can be carried out in higher frequency.
What was experimental method 2?
ChIP-seq (Chromatin immunoprecipitation then sequencing). Specific regions of cross-linked chromatin, which is genomic DNA in complex with its bound proteins, are selected by using a specific antibody. The sample is then sequenced to determine the regions in the genome most often bound by the protein to which the antibody was directed, such as transcription factors, chromatin binding proteins and specific chemical modifications on histone proteins.
DNase-seq. The DNase I (1) enzyme will preferentially cut live chromatin preparations at sites where nearby there are specific (non-histone) proteins. The resulting cut points are then sequenced using high-throughput sequencing to determine those sites ‘hypersensitive’ to DNase I, corresponding to open chromatin. When DNA is packaged up into protein it can’t do much in terms of function but when the DNA is opened up it can be open to function, DNase enzyme lets us know which parts of the DNA is accessible at any point.
What was experimental method 3?
Main experimental methods 3
FAIRE-seq (Formaldehyde assisted isolation of regulatory elements). This isolates nucleosome-depleted genomic regions by exploiting the difference in crosslinking efficiency between nucleosomes (high) and sequence-specific regulatory factors (low). FAIRE consists of crosslinking, phenol extraction, and sequencing the DNA fragments in the aqueous phase.
It takes living tissue where a protein is interacting with DNA to do something, freezing it in place, making it bind more tightly.
RRBS (Reduced representation bisulphite sequencing).*read up and listen to lecture again (10-12mins) Bisulphite treatment of DNA sequence converts unmethylated cytosines to uracil. To focus the assay and save costs, specific restriction enzymes that cut around CpG dinucleotides can reduce the genome to a portion specifically enriched in CpGs. This enriched sample is then sequenced to determine the methylation status of individual cytosines quantitatively.
What were the key findings of the ENCODE project?
Key findings
80.4% of the genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type.
95% of the genome lies within 8 kbp of a DNA–protein interaction
99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.
399,124 regions have enhancer-like features, (enhancers – enhance gene expression)
70,292 regions have promoter-like features, (promote gene expression)
The ENCODE project got loads of publicity:
human genome more active than thought, no longer thought as ‘junk DNA’, ENCODE project has changed the perception that most DNA is obsolete.
Debate over ENCODE:
Debate focused around several issues, such as?
Debate focused around several issues
What does “functional” mean?
“We define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).” p.57 ENCODE paper
Functional non-conserved elements
“Many of the regulatory elements are not constrained across mammalian evolution, which so far has been one of the most reliable indications of an important biochemical event for the organism.” p. 71 ENCODE paper
A lot of the debate is due to junk DNA segregating us from many other animals, from an evolutionary point of few. A lot of the controversy has been around the evolutionary perspective of what’s going on
C-value paradox;
The death of ‘junk’ DNA
Was it over-hyped?
Critics of the ENCODE project were?
Dan Graur one of the many people suggesting that ENCODE is wrong,
1. If human genome is indeed devoid of junk DNA as implied by the ENCODE project, then a long undirected evolutionary process cannot explain the human genome. If on the other hand organsism are designed, then all DNA or as much as possible is expected to exhibit function. If ENCODE is right then evolution is wrong.
One the immortality of the television sets; Function in the Human Genome according to the evolution-free gospel of ENCODE
From an evolutionary perspective, 8.2% of the human genome is constrained; variation in rates of turnover across functional element classes in the human lineage – non-functional DNA should be a random sequence.
Ford Doolittle – critique of ENCODE, main argument is regarding the C-value paradox