5.1 ChIP-SEQ Flashcards
What are some applications of sequencing?
- WGS variant calling
- WGS metagenome
- RNAseq Transcriptomes
- Chipseq protein DNA interactions
What are histone modifications?
a type of epigenetic modification that acts to reinforce open or closed chromatin conformations
What do enhancers do?
regulate gene activity; are defined and controlled by epigenetic state
distal and proximal
What are ways genomes can be sheared?
- sonication
- enzymes
- transposons
What is ChIPseq?
a sequencing based approach used to measure histone modification patterns in the genomes
What are each of these marks characteristic of:
- H3K4me3
- H3K4me1
- H3K27ac
- H3K36me3
- H3K9me3
- H3K27me3
- active promotors
- enhancers
- active enhancers
- elongation transcription
- repressive/heterochr
6 . repressive
What are the 4 main steps in the ChIp-seq analysis workflow
- 2 libraries of generated: (1) library made using ChIPseq & abs against histones ( 2 ) control library called the input ; fastq
- Align and pair sequence reads;
tool: BWA mem
file: .SAM - Convert to binary format, sort
tool: SAMMAMMA or SAMtools
file: .sorted.bam - Generate density tracks and call peaks
tool: MACS2
file: .bedgraph, .bed
What are BigWigs?
binary, indexed version of BEDgraph files that are loaded directly into the browser for visualization
AKA compressed binary version of BED and BEDgraph files
What file format is used to visualize Chip data?
Bed and BedGraph files
What are the 3 required fields in a BED file?
- chrom: name of chromosome
- chronStart: start position of the features in the chromosome or scaffold, 0 -indexed
- chromEnd: ending position of the features in the chromo or scaff
How are the coordinates in BED file reported?
Half open; closed start and open end [A, B)
What is a BEDgraph file?
A 4 column BED file with a track header; It contains a track definition line to set browser parameters
The columns include the same 3 as BED file + a 4th dataValue column
Encode quantitative data like signal amplitude
What are Key considerations in ChIP-seq
- antibody specificity and sensitivity
- Which marks to profile
- required sequencing depth
What is a wiggle format?
Way of encoding single nucleotide information; single based resolution bed format
Why might the required sequencing depth differ in ChIpseq?
- feature prominence; is something is prominent, need more sequencing reads to cover those events (AKA mark occupany)
- 50M read pairs for punctate marks
- 100M read pairs for broad marks
What is the input/control used in ChIpseq?
Input library consists of the sheared DNA prior to IP; the control is used for background correction
What is iCHIP and what are the advantages of it?
Indexed are added at the ligation step allowing for multiple experiments to be pooled together before IP
Advantage: can use smaller # of cells which might be rare
How is the quality of a CHIPseq sequencing run file evaulated?
- sequence quality
- Library quality
- IP quality
How is sequence quality deterined?
By FASTQC
How is library quality determined?
By the diversity of IP’d fragments in library used as measure of library quality; the more diverse fragments present, the more represented of the target
What is the relationship between library diversity and PCR duplicate rate
inversely correlated
What are PCR dups and how are they created?
Are reads that have identical start and stop positions relative to a reference
NOT created by clonal amplification but caused by PCR cycles used to amplify library after adapter ligation
What do the presence of PCR duplicates mean?
That the library generated is less diverse and therefore lower quality
Why might duplicates be bad?
If initial starting material is low, duplicates can lead to overamplification of the material before sequencing. Any biases in PCR will compound this problem and lead to artificial peak calls
In Chip we are trying to look for event frequencies