5.1 ChIP-SEQ Flashcards

1
Q

What are some applications of sequencing?

A
  • WGS variant calling
  • WGS metagenome
  • RNAseq Transcriptomes
  • Chipseq protein DNA interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are histone modifications?

A

a type of epigenetic modification that acts to reinforce open or closed chromatin conformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do enhancers do?

A

regulate gene activity; are defined and controlled by epigenetic state

distal and proximal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are ways genomes can be sheared?

A
  1. sonication
  2. enzymes
  3. transposons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ChIPseq?

A

a sequencing based approach used to measure histone modification patterns in the genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are each of these marks characteristic of:

  1. H3K4me3
  2. H3K4me1
  3. H3K27ac
  4. H3K36me3
  5. H3K9me3
  6. H3K27me3
A
  1. active promotors
  2. enhancers
  3. active enhancers
  4. elongation transcription
  5. repressive/heterochr
    6 . repressive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 main steps in the ChIp-seq analysis workflow

A
  1. 2 libraries of generated: (1) library made using ChIPseq & abs against histones ( 2 ) control library called the input ; fastq
  2. Align and pair sequence reads;
    tool: BWA mem
    file: .SAM
  3. Convert to binary format, sort
    tool: SAMMAMMA or SAMtools
    file: .sorted.bam
  4. Generate density tracks and call peaks
    tool: MACS2
    file: .bedgraph, .bed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are BigWigs?

A

binary, indexed version of BEDgraph files that are loaded directly into the browser for visualization

AKA compressed binary version of BED and BEDgraph files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What file format is used to visualize Chip data?

A

Bed and BedGraph files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 required fields in a BED file?

A
  1. chrom: name of chromosome
  2. chronStart: start position of the features in the chromosome or scaffold, 0 -indexed
  3. chromEnd: ending position of the features in the chromo or scaff
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are the coordinates in BED file reported?

A

Half open; closed start and open end [A, B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a BEDgraph file?

A

A 4 column BED file with a track header; It contains a track definition line to set browser parameters

The columns include the same 3 as BED file + a 4th dataValue column

Encode quantitative data like signal amplitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Key considerations in ChIP-seq

A
  1. antibody specificity and sensitivity
  2. Which marks to profile
  3. required sequencing depth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a wiggle format?

A

Way of encoding single nucleotide information; single based resolution bed format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why might the required sequencing depth differ in ChIpseq?

A
  • feature prominence; is something is prominent, need more sequencing reads to cover those events (AKA mark occupany)
  • 50M read pairs for punctate marks
  • 100M read pairs for broad marks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the input/control used in ChIpseq?

A

Input library consists of the sheared DNA prior to IP; the control is used for background correction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is iCHIP and what are the advantages of it?

A

Indexed are added at the ligation step allowing for multiple experiments to be pooled together before IP
Advantage: can use smaller # of cells which might be rare

18
Q

How is the quality of a CHIPseq sequencing run file evaulated?

A
  1. sequence quality
  2. Library quality
  3. IP quality
19
Q

How is sequence quality deterined?

A

By FASTQC

20
Q

How is library quality determined?

A

By the diversity of IP’d fragments in library used as measure of library quality; the more diverse fragments present, the more represented of the target

21
Q

What is the relationship between library diversity and PCR duplicate rate

A

inversely correlated

22
Q

What are PCR dups and how are they created?

A

Are reads that have identical start and stop positions relative to a reference

NOT created by clonal amplification but caused by PCR cycles used to amplify library after adapter ligation

23
Q

What do the presence of PCR duplicates mean?

A

That the library generated is less diverse and therefore lower quality

24
Q

Why might duplicates be bad?

A

If initial starting material is low, duplicates can lead to overamplification of the material before sequencing. Any biases in PCR will compound this problem and lead to artificial peak calls

In Chip we are trying to look for event frequencies

25
Q

What are good duplicates?

A

biological duplicates are expected because small parts of the genome that are enriched by IP are being sequenced.

26
Q

What happens when u remove biological duplicates?

A

The dynamic range of the CHIP-seq signal for that region is compressed.

27
Q

How can biological duplicates be identified?

A
  1. Add 3 random N’s to adaptor, where rest of the adaptor is a standard adaptor sequence
  2. These 3 sequences are unique to each fragment before PCR
  3. Use these unique identifiers to tell difference between biological and PCR duplications
28
Q

How does the appearance of biological duplicates compare to PCR duplicates?

A

Biological dups will have multiple overlapping reads with offsets

PCRdup will look like reads stacks that align perfectly

29
Q

What are two common measures of IP quality?

A

FRIP: Fraction of sequencing aligning within called peaks

Domain reads

30
Q

What are domain reads?

A
  • fraction of sequencing alignments within a defined set of genomic regions
  • ex: Gene, promoters,
  • Specific to the IP’d target
31
Q

Why do you want to look at the number of reads that align to a peak?

A

Tells you what the signal to background is

32
Q

What is the purpose of MACS2?

A

identify regions of enrichment comparing a treatment (IP) to a control (input)

33
Q

What are the firs 4 general steps in MACS2

A
  1. Taking BAM file, scan along chromosome to find significantly enriched bins (2x average fragment size) with counts/mfold higher than random genome average
  2. For 1000 randomly choose enriched bins calculate difference between distribution max of reads on + and - strand = d
  3. shift add + reads by +d/2 and (-) reads by -d/2
  4. Scale control experiment to the same number of reads as chip seq
34
Q

What are regions that are high enriched in ChipSEQ?

A

Usually artifacts that are PCR duplicates

35
Q

Why is d calculated and reads shifted by d/2 in chipseq?

A

The nucleosome inhibits DNA sharing at the true peak

36
Q

What kind of distribution does MACS2 use for read count?

A

Poisson distribution

37
Q

In MACS2 what does lambda represent?

A

The max mean number of reads that align with control bin (the expected number of reads in that window); calculated from input

Lamda = max value obtained from this control

38
Q

How is a p-value calculated in MACS2?

A

p-value calculated using IP-mean and poisson distribution for lamda

39
Q

What is the False Discovery Rate?

A

The expected proportion of discovered that are falsely rejected

40
Q

What are the inputs for MACS?

A
  1. treatment file (required); BAM or BED

2. control/input

41
Q

How can CHIPseq data be visualized?

A

bedgraphs can be visualized using IGV