Chaudhuri Flashcards

Question

How many reading frames are there for DNA?

Answer 1

- 6 - triplet genetic code, so 3 distinct ways DNA strand can encode a protein and 3 more in reverse direction on complementary strand

Answer 2

- longest ORF likely to be gene

Answer 3

- long ORFs - AUG start codon - Shine-Dalgarno site - Pribnow box - characteristic base composition due to biases in codon usage

Answer 4

- introns so harder - algorithms exist but are hit and miss - look for - -> Kozak seq - -> euk terminator consensus - -> polyA adenylation signal - -> splice donor and acceptor sites can indicate inton presence - RNA seq data can reveal which regions present in mature RNAs so assist w/ identification of genes and introns

Answer 5

- used to be manual - pipelines, eg. Prokka and MAKER, provide automated annotation - apply no. programs to predict positions of protein coding genes, tRNA and rRNA genes - also use BLAST to identify homologues from which functional annotation can be transferred

Answer 6

- simpler to reseq from species already seq than seq new genome - investigate genomic variation w/in pop of species - understanding single gene, complex disorders and cancer - identifying variants for diagnosis --> may allow personalised medicine or genome editing based cures in future - relied on by functional genomic techs, like RNA seq and ChIP seq

Answer 7

- each read compared w/ sorted index derived from reference genome seq to identify short identical matches ("seed seqs") - don't use whole read as seed seq, as looking for differences, only small chunks will match - alignment from all seed matches extended to inc rets of read - alignment scoring system used to identify best mapping position, accounting for no. matches, mismatches and base qualities (mismatches at low quality bases penalised less than high quality mismatches) - each mapped read given mapping score, indicating confidence that read is derived from that position in genome (uniquely mapped reads have high score and ambiguously mapped have low score)

Answer 8

- BWA and Bowtie2

Answer 9

- usually BAM file | - contains details of which position on which chromosome mapped to and how good alignment is

Answer 10

- no. reads which overlap particular position

Answer 11

- bar chart showing variation in coverage depth

Answer 12

- reads in pair mapped independently, but if 1 maps to multiple positions, position of other can be used to determine correct position - if both reads w/in repeat, then mapping ambiguous (then 1 usually chosen and given low mapping score) - for some apps, low mapping score reads excluded

Answer 13

- look at overview of whole chromosome, w/ focused region highlighted and ruler to show genomic coords - pile up plot of read depth at each position - positions w/ mapped reads highlighted - can also show equivalent into from 2nd highlighted biological sample

Answer 14

- if only 1x coverage would be imposs to distinguish errors from real diffs between seq sample and ref genome - so seq each base many times

Answer 15

- errors random | - broadly but always

Answer 16

- uses probabilistic models to distinguish errors from homozygous and heterozygous SNPs

Answer 17

- coloured lines mean varies from ref genome - homozygous SNPs present in all reads, seq errors only in 1 - heterozygous SNPs ≈ half match ref and ≈ show SNP

Answer 18

- programs, such as SNPeff predict SNP effects | - eg. intergenic, intron, regulatory, synonymous (doesn't change encoded protein), non synonymous

Answer 19

- random or systematic seq errors - mapping errors (mapper placed read in incorrect position) - sample contamination (contains DNA from another source w/ diff seq) - seq contamination (reads from another sample mislabelled) - errors or omissions in ref genome

Answer 20

- insertions - deletions - duplications - inversions - translocations

Answer 21

- using coverage depth - -> duplication where coverage depth higher and deletion where no reads - using read pairs - -> no structural variation if map to same place on ref and sample - -> deletion if further apart on ref - -> mobile element insertion if only one maps to ref - using split reads (for deletions) - -> look for indiv reads that overlap deletio - -> can see if read matches 2 positions, so see where gap is - using assembly - -> might get read w/ 1 end matching ref genome - -> then look for overlapping read pairs to assemble and see what insertion could be

Answer 22

- simplest form of functional genomics - identifies regions of genome pot assoc w/ particular phenotype by statistical assoc - look for variants over represented - datasets often shown in Manhattan plot - can be misleading, correlation NOT causation and further studies req to establish mol mechanism underlying phenotype

Answer 23

- tells us which genes expressed under particular conditions

Answer 24

- traditionally done gene by gene, using N blotting or RT-qPCR - N blots use radiolabeled probes to detect specific seqs from whole RNA cell extracts - RT-qPCR uses fluorescently labelled primers to quantify levels of specific transcript during PCR amplification

Answer 25

- measuring transcrip levels for whole genome

Answer 26

- like reverse N blot - can be rep 100s times on slide w/ ordered array of diff probes at known positions on slide surface - probes can be designed for every gene, so poss to measure global gene expression of sample - fluorescently labelled RNA sample added to surface of array - if transcript complementary to particular probe, will hybridise and spot lights up - measure fluorescence level of particular spot to assess abundance of transcript in sample

Answer 27

- same principles as any other - techs more expensive, so mistakes cost more, temptation to cut corners - as always approp controls and rep essential - simplest case is comparing 2 conditions (eg. treatment and control) - can be done of 2 separate microarrays or use 2 colour microarrays, to allow direct comparison - more complex designs such as courses also poss - can use microarrays in which red spot indicates transcript more abundant in sample, green indicated less abundant in sample and yellow indicates similar levels

Answer 28

- common to focus on top few upreg and downreg genes by choosing arbitrary fold change cut off

Answer 29

- low resolution sequencing tech - if get signal for particular prob, know that seq present in sample - don't usually know if that is exact seq - don't know if any seqs present not covered by microarray probes

Answer 30

- reverse transcrip, fragmentation and amplification to make cDNA library - high throughput seq - map reads to ref genome

Answer 31

- both well dev w/ min technical variation - RNA-seq has larger dynamic range (greater ability to distinguish diff levels of expression - microarrays only give info for pre selected regions of genome, RNA-seq genome wide and can detect novel transcripts - microarrays can have dye bias effects (diff intensity for diff colour dyes) - RNA-seq allows detection of diff from ref genome - RNA-seq can be done w/o ref genome

Answer 32

- introns spliced out of pre-mRNA to get exonic seq in mRNA

Answer 33

- map to ref genome to quantify expression of each gene - if bacterial genome, can be done using standard read-mapping software - not for euk, due to mRNA splicing --> need splice aware read mapper, eg. TopHat

Answer 34

- poss in absence of ref genome - similar to genome assembly but more complex, as not all transcripts present at same level and some genes may prod multiple diff transcripts - most popular software package is Trinity

Answer 35

- rep essential, typically 3x biological replicates - important to isolate variable of interest --> not diff person doing all WT samples and someone else mutant, as could be diffs in procedure

Answer 36

- ratio of sample to control signal = fold change - usually expressed as log2(fold change) pr logFC (fold change of 1/2 = 1/2 as mich mutant as WT) - adv is that its symmetrical --> so genes upreg or downreg 2 fold have ratio of 2 and 1/2, respectively, but log2(fold change) of +1 and -1 - interested if logFC signif diff from 0 --> t test - usually relatively few replicates, so t test lacks power - modern analysis programs, eg. Limma (microarray) and DEseq2 (RNA-seq) solve this by taking adv of large no. parallel experiments

Answer 37

- technical = doing experiment once and arraying/seq extracted RNA multiple times (tests reproducibility of techniques, not important now, as know good) - biological = doing experiment multiple times

Answer 38

- if use signif threshold of 5% , expect to see signif effects n 5% of experiments by chance - can lead to many false +ves when performing 1000s tests in parallel

Answer 39

- usually "false discovery rate" adjustment made to P values, control % of false +ves to be equal to chosen P value cut off

Answer 40

- mostly involves add of methyl group to 5C red of cytosine - cat by DNA methyltransferases - eg. of epigenetics

Answer 41

Involved in many processes: - reg of gene expression - imprinting - X chromosome inactivation - silencing of germline specific genes and repeat genes

Answer 42

- distinguish self DNA from non-self - non-self can be digested w/ REs, acting as IS - also important role in controlling bacterial DNA rep, limiting it to single rep per cell cycle

Answer 43

- downstream context of cytosine critical for determining its methylation status - CpG (C adj to G) - CHG (C, any base but G, G) - CHH (C, 2x not G bases)

Answer 44

- methylated cytosine easily mutates to uracil, which is repaired to thymine

Answer 45

- bisulphite treatment converts unmethylated cytosine to uracil - methylated cytosines protected and no converted, so detected - can be targeted to CpG islands

Answer 46

- targets BS-seq analysis to regions of genome likely to have high CpG content - allows us to make most of sequencing run, particularly using lower-yield sequencing platforms - exploits REs w/ recognition site containing CpG

Answer 47

- SMRT-seq allows methylated bases to be distinguished, as their presence delays progress of pol

Answer 48

- directly detects disruption in electrical current caused by base passing through pore in membrane - methylated bases give distinct signal from unmethylated

Answer 49

- isolate DNA bound by specific protein

Answer 50

- proteins covalently crosslinked to DNA by treating w/ formaldehyde (get protein physically attached to DNA) - chromatin sheared by sonication or using endonuclease --> use of exonuclease allows bound DNA to be trimmed to binding site - immunoprecipitation and purification of bound DNA using antibody specific to protein of interest

Answer 51

- involves identification of ChIP purified DNA using microarray - usually tiling microarray. w/ probes designed at regular intervals across region or whole genome

Answer 52

- DNA purified from ChIP can be identified using Illumina - reads mapped to ref genome and binding sites identifies as peaks in signal - 5' --> 3' exonuclease used to trim DNA before binding site - means offset between reads on forward and reverse strand, allows exact boundaries of binding site to be determines - binding site is overlap between forward and reverse peaks

Answer 53

- 3C similar to ChIP-seq, but cross-links remote regions of DNA instead of DNA and protein - allows investigation of long range interactions between diff genomic regions, such as interaction of enhancer elements w/ target gene

Answer 54

- 3C enhanced by self-circularisation (4C) = seq info only req from 1 of interacting loci (need to know seqs of regions of gene interested in) - carbon copy chromosome capture (5C) = allows massively parallel analysis of ligation junction through incorp of universal primer seq - Hi-C = uses biotin capture of ligation junctions followed by high throughput seq

Answer 55

- 3C = 1 to 1 - 4C = 1 to all - 5C = many to many - Hi-C = all to all

Answer 56

- aimed at identifying all functional elements in human genome

Answer 57

- ≈80% of human genome has some biochemical function | - controversial as liberal definition of function

Answer 58

- essentially same as ChIP, but w/ RNA

Answer 59

- identify genetic changes assoc w/ particular phenotype | - esp in bacteria, common to exploit transposons to gen random insertion mutations

Answer 60

- if transposase gene removed, transposon can still move if transposase supplied, but stable otherwise - inc antibiotic resistance gene allows mutants to be selected - poss to insert at random into target gene by supplying transposase - once transposase removed, mutant strains will harbour stable antibiotic resistance gene at random position w/ genome - if gene inactivated by transposon, will be inactivated - -> if gene essential, mutant won't survive - -> if make millions of mutants, transposons found in every gene poss to disrupt - -> so genes w/o insertions likely to be essential

Answer 61

- primers recognise inverted repeat seq and seq outwards into flanking chromosomal DNA

Answer 62

- in C. dif polA gene - -> insertions not found w/in 5' --> 3' exonuclease domain but tolerated w/in rest of gene - TraDIS can provide info about essential regions w/in gene

Answer 63

- get input pool of random transposon mutants - inoculate - put through screen to recover = output pool - get TraDIS data, showing which genes essential

Chaudhuri Flashcards

(87 cards)