ENCODE Project Flashcards
Aims
aims to identify all genetic regulatory regions
- DNA regions regulated and factors regulating them
ENCODE project
Encyclopedia of DNA Elements
- wanted to identify all functional elements in genome
Complexity of Expression
- TF, enhancers, silencers, methylation patterns
- splicing factors regulate transcripts
- control and interaction of factor not well understood
Identification of Regulatory Features
- RNA sequences: regions transcribed
- ChIP-seq: reveals where protein binds
- open chromatin: accessible areas to regulatory proteins
- chromatin interaction: disparate regions brought together to regulate a gene
- DNA methylation: methyl groups
- RNA binding: positions for regulatory proteins
RNA-seq
- RNA fragmented
- sequenced
- mapped to reference gene
- exons identified
ChIP-seq
- DNA with interacting proteins sheared
- immunoprecipitation with TF antibodies
- purify without TF
- map to reference genome
- find TF binding site location
- only able to identify sites on known TF
DNA Hypersensitive Sites
- highly sensitive chromatin regions to DNase 1
- nucleosomal structure less compacted allowing DNA to bind gproteins
- mapping these sites identifies locations of regulatory elements
- tells us novel binding sites
DNase-seq
- DNase digestion
- library preparation
- PCR amplification
- sequencing
- map back to show open regions
DNAase Footprinting
- number of fragments that map to a sequence is a measure to regulatory activity
- sites bound by TF show highly specific patterns of DNase1 cleavage
- genome wide footprinting method
FAIRE-seq
- alternative DNase seq
- formaldehyde cross linking (more efficient in nucleosome bound DNA)
- phenol extraction
- identifies open regions
- higher coverage at enhancers
ATAC-seq
- assay for transposase accessible chromatin
- alternative to DNA-seq using mutated hyperactive transposase instead
- cuts exposed DNA
- isolated, sequenced, mapped
- small sample size and fast
Chromatin Interaction
- genes regulated by regions distant from promoter
- need a way to identify long range interactions involving protein factors
ChIA-PET
- isolates chromatin complexes
- identify DNA sequence
- PET sequences mapped back
- shows self ligation: one fragment (TF binding site)
- interligation: two fragments (DNA coming together)
- clusters of overlapping PET sequences identifies enriched protein binding sites
ChIA-PET method
- separate out bound chromatin
- ChIP enrichment via antibodies
- links added to DNA ends
- proximity ligation: linkers of same chromatin will use the same linker (closer together = more likely same type will link)
- digest to make tags
- separate same and different linkers based on types
- sequencing and analysis
Long Range ChIA-PET
- chromatin cross linked and fragmented
- isolated with immunoprecipitation
- DNA ligated (self and inter strand)
- fragmentation and sequencing adaptors added
- linker DNA isolated
- mapping/sequencing
Reduced Representation Bisulphite Sequencing
- reduces amount of nucleotides needed to sequence to 1% of genome
- doesn’t identify all CG sites
- uses RE to cut CG to make CG end fragments
Spliceosome
U1 snRNP complex recognises 5’ splice site
U2 AF proteins recognise 3’/polypyrimidine tract
U2 snRNP complex recognises the branch site
- SNPs can have significant effect
RIP-seq
- immunoprecipitation of RNA binding protein of interest
- RNA bound to RBP isolated for sequencing
CLIP-Seq
- additional crosslinking with UV
- crosslinking causes tighter binding and allows identification of binding sites when cDNA mapped back
1. immunoprecipitation
2. digestion
3. reverse transcription
4. sequencing/mapping
-Seq Key Features
- identify region of interest
- isolate sequence
- sequence fragment and map back to genome
ENCODE Controversy
- states most of genome is function
- considers anything transcribed is functional (many such as pseudogenes are not)
- emphasized sensitivity over specificity (false positives)
- lack of appropriate controls
- abritrary choice of cell lines
- doesn’t detract from data *