Sequencing and Bioinformatics Flashcards

Question

What did Celera demonstrate with shotgun sequencing?

Answer 1

That shotgun sequencing was feasible for even large and repetitive genomes

Answer 2

Doesn't scale well as when reading, each capillary can only produce 1 sequence at a time The IHGSC had 'factories' with hundreds of sequencers

Answer 3

Price to sequence human genome dropped drastically from 2007 onwards due to Next-Generation Sequencing

Answer 4

Lots of molecules are sequenced at the same time as opposed to one; Reduced costs Illumina is the main platform for this technology

Answer 5

≈8 billion sequence reads ≈100bp sequence reads

Answer 6

Sequencing by synthesis Fluorescent bases are used to detect each base

Answer 7

Optical sensors are not sensitive enough to detect the signal from a single template molecule PCR amplification of template molecules is done via a process called bridge amplification

Answer 8

Fragment DNA and size select fragments of ≈500bp

Answer 9

Generate millions of separate clusters, each with sequence data from a different region of the genome Clusters are large enough to be detected when they fluoresce Some clusters overlap, potentially resulting in the loss of a few reads; Doesn't matter as there are so many clusters

Answer 10

Reversible terminator nucleotides; Block chain extension, but the block and dye can be removed Once the block is removed it acts like a dNTP

Answer 11

Single molecule sequencing – No amplification required Real time sequencing – Data is generated during the run Ultra-long read lengths - Up to 50kb (PacBio) or >2Mb (nanopore) Can directly identify base modifications such as methylation

Answer 12

Fewer reads per run than Illumina More expensive per base Individual reads have a high error rate (although consensus accuracy is good) (high error rate is inevitable as you are sequencing a single molecule at a time)

Answer 13

Zero mode waveguides are wells that cover the aluminium surface 150,000 wells

Answer 14

DNA is bound to a DNA polymerase and a sequencing primer Polymerase is immobilised at the bottom of a well; 1 polymerase per well

Answer 15

Only allows the light to penetrate a small distance into the well, exciting a very small volume This allows the signal from a single fluorescently labelled nucleotide to be detected

Answer 16

When a fluorescent nucleotide is bound by the polymerase, it remains within the illuminated zone and gives a detectable signal (unbound nucleotides will diffuse in and out quickly, not giving a consistent signal The label is then cleaved away and another nucleotide is then incorporated and fluoresced, showing the sequence

Answer 17

Hairpin loops on either end of a ds-template DNA, which connects them to make a single continuous loop of DNA

Answer 18

Larger fragment reads have a poorer quality with low accuracy Smaller insert reads are sequenced several times as the primers goes round These sequences can be combined to reduce the error rate and giving circular consensus sequences (CSS) CCS reads can be used to correct the long but lower quality reads, giving corrected long reads (CLR)

Answer 19

15kb >99.999% accuracy

Answer 20

Read lengths of up to 100kb Error rates of ≈1% Direct sequencing of DNA, RNA and protein No library prep; Sequence directly from biological samples Small and portable models

Answer 21

Pore proteins embedded within an artificial membrane which is electrically insulating Motor protein pushes a single strand of DNA through the pore, resulting in a change in the electrical current flowing through the pore

Answer 22

Several bases can pass through the pore at once Short sequences (e.g. 5 bases) have their own characteristic signal

Answer 23

Genomic DNA is fragmented and paired reads are obtained from either end of each fragment The original chromosome sequences are computationally reconstructed; This is called de novo assembly

Answer 24

Computationally determining the most likely position that each sequence read derives from Repetitive regions of the genome are a problem for mapping Once the reads are mapped to the reference genome, it is possible to identify different positions, a process called "variant detection"

Answer 25

Individuals of a species are not all identical; Resequencing allows us to understand genetic variation within a population For human populations, this is of particular interest for studying single-gene and complex genetic disorders Cancers are effectively evolving organisms which are genetically different from the patient; Sequencing allows us to understand the genetic changes which occur as the cancer progresses Functional genomic (identifying function of genomes) technologies such as RNA-seq and ChIP-seq involve resequencing

Answer 26

Northern blotting Radiolabelled probes are used to detect the presence of a particular transcript within a whole cell RNA extract The level of expression can be assessed (semi)quantitatively by the size of the band

Answer 27

Reverse transcription quantitative PCR It uses reverse transcriptase to make cDNA from transcripts The cDNA corresponding to a transcript can be PCR amplified; Fluorescent primers allow the transcript level to be quantified relative to a reference gene

Answer 28

A glass slide onto which a spot consisting of lots of copies of a probe sequence can be attached Can be done in parallel; Thousands of probes at a time

Answer 29

Microarray scanner detect the average intensity of each spot on the microarray and use it as a measure of the transcript level associated with each gene

Answer 30

Technical replicates involve assessing the same biological sample on multiple microarrays Biological replication requires us to repeat the entire experiment independently

Answer 31

Typically biological variation is larger than technical variation, so it is usually appropriate to perform multiple biological replicates

Answer 32

They are expensive By looking at many genes in parallel to estimate the “normal” level of variation between biological replicates; Allows us to identify genes where the difference in expression is greater than would be expected by chance

Answer 33

Differential expression between the experimental sample and control Expressed as 'fold change'; Level of expression in the experimental/Level of expression in the control

Answer 34

Log2 of fold change calculation

Answer 35

0 = Expression remains at the same level in the experimental +1 = Expression doubles in the experimental -1 = Expression halves in the experimental

Answer 36

Microarrays are a low-resolution sequencing technology - If we get a signal for a particular probe, we know that the sequence is present in our sample; However, we usually don’t know if that is the exact sequence that is present We also don’t know if there are any sequences present which are not covered by our microarray probes There is a limit to how much RNA can hybridise to a particular spot on the microarray; Can limit our ability to distinguish the expression levels of highly expressed genes

Answer 37

Fragment input RNA Reverse transcribe it to cDNA Attach adaptor molecules to it Sequence them to produce many sequence reads

Answer 38

The reads are mapped to a reference genome The number of reads mapping to each gene is used as a measure of the expression levels of each gene

Answer 39

mRNA sequence does not exactly correspond to the sequence of the reference genome Processed mRNAs consist of adjacent exon sequences, but the exons are separated by introns in the genome sequence

Answer 40

They are split during mapping using a splice aware aligner (see image on notes)

Answer 41

De novo assembly of transcripts - Not all transcripts are present at the same level - Same gene may produce multiple different transcripts - Assembled transcripts can be annotated in a similar way to genome sequences

Answer 42

RNA-seq has a larger dynamic range than microarrays (greater ability to distinguish different levels of expression) Microarrays only give information for pre-selected regions of the genome; RNA-seq is genome-wide, and can detect novel transcripts RNA-seq allows us to detect differences from the reference genome, such as SNPs in transcribed regions RNA-seq can be done without a reference genome – de novo assembly of the transcriptome is possible

Answer 43

Alternative splicing allows for a single gene to produce many different transcripts and proteins Adds to the complexity of RNA-seq analysis

Answer 44

Addition of a methyl group to C5 of cytosine Acts to downregulate/regulate gene expression and is an example of epigenetics

Answer 45

X chromosome inactivation Silencing of germline-specific genes and repeat regions Imprinting (distinguish maternal and paternal alleles)

Answer 46

To distinguish 'self' DNA from 'non-self' Non-self DNA can be digested by enzymes that acts as the immune system They also use methylation to control bacterial DNA replication; Limit of a single replication per cell cycle

Answer 47

CpG - C linked to G by phosphate backbone CHG - C followed by 'not G' followed by a G CHH - C followed by 2 non-G bases

Answer 48

CpG methylation persists whereas CHG and CHH methylation do not

Answer 49

CpG islands The gene is expressed

Answer 50

Chemically inducing deamination of cytosine Methylated cytosine does not undergo this change

Answer 51

Bisulphite Sequencing A sample is sequenced before and after bisulphite conversion These can be compared as methylated cytosine will remain as cytosine and unmethylated cytosine will be changed to uracil and read as thymine

Answer 52

RRBS is a method of targeting BS-seq to regions which are likely to have a high CpG content (e.g. CpG islands) This allows us to make the most of a sequencing run RRBS exploits restriction enzymes which have a recognition site containing CpG By digesting with this enzyme and selecting small fragments, we target regions of high CpG density

Answer 53

Allows methylated bases to be distinguished from unmethylated ones (adenine as well as cytosine) The presence of a methylated base delays the progress of the polymerase; This can be detected by analysis of the polymerase kinetics

Answer 54

It detects a disruption in electrical current caused by a base passing through a pore in a membrane Methylated bases give a distinct signal from unmethylated ones; Allows for direct methylation measuring

Answer 55

It is a method that can be used to isolate DNA bound by specific protein

Answer 56

1. Proteins covalently crosslinked to DNA by treating with formaldehyde 2. Chromatin sheared by sonication or using an endonuclease (ChIP-exo) allows the bound DNA to be trimmed to the binding site 3. Immunoprecipitation and purification of bound DNA using an antibody specific to the protein of interest

Answer 57

ChIP-on-chip involves identification of the ChIP-purified DNA using a microarray The purified binding sites are labelled and hybridised to a tiling microarray to determine the genomic regions where the protein is bound

Answer 58

Sequencing of the ChIP-purified binding sites directly using high throughput sequencing platforms (e.g. Illumina) Reads are mapped to the reference genome, and binding sites are identified as peaks in the signal There is an offset between reads on the forward and reverse strand, which allows the exact boundaries of the binding site to be determined; Due to trimming DNA to binding site

Answer 59

Uses formaldehyde to identify and form cross-links between long-range interacting regions of the genome The cross-linked chromatin is digested, the loose ends ligated, and the cross-link is removed to form a single continuous piece of DNA containing sequence from the 2 interacting regions

Answer 60

3C - Look for specific interaction between 2 known partners 4C - Identify remote regions which interact with region of interest 5C - Discovery of novel interactions Hi-C - Allows comprehensive genome-wide characterisation of all of the interactions between remote chromosomal regions

Answer 61

3C uses 2 specific primers, so is good for targeting interactions between 2 known loci

Answer 62

4C introduces a circularisation step, meaning that only one of the interaction partners needs to be pre-selected

Answer 63

5C uses amplification using primers with a universal “tail” sequence PCR using primers which recognise this overhanging tail sequence can be used to **amplify interactions between many interacting regions**

Answer 64

Biotin is incorporated into the cross-link between interacting loci The protein streptavidin has a high affinity for biotin, and is used to purify out the biotin-labelled DNA containing interacting loci This is followed by high-throughput sequencing to get a genome-wide view of all long range chromosomal interactions

Answer 65

Long range chromosomal interactions Regions of open chromatin DNA methylation sites Transcription factor binding sites Enhancers and promoters Coding and non-coding transcribed regions

Answer 66

ENCyclopedia Of DNA Elements Aims to identify all the functional elements in the human genome RNA-seq, 5C and ChIP-seq were used Genes were identified with RT-PCR or computational prediction

Answer 67

They identify RNA-binding proteins, whereas ChIP-seq identifies DNA-binding proteins

Answer 68

Regions of open chromatin - DNase-seq exploits open chromatins hypersensitivity to DNase I digestion - FAIRE-seq uses formaldehyde crosslinking of DNA to nucleosomes and purifies unbound DNA

Answer 69

Both identify long range chromosomal interactions It does this through ChIP-seq analysis of DNA-nucleosome interactions, instead of direct ligations of interacting DNA regions

Answer 70

methyl450k is a microarray-based method of identifying DNA methylation

Answer 71

They were able to assign biochemical functions for 80% of the human genome This conflicted with the previous view that much of the genome was “junk DNA” with no function

Answer 72

Mutations are more likely to cause problems if they are within exons Functionally important regions of the genome tend to be evolutionarily conserved

Answer 73

The rate of evolution of the genome is not uniform, and functionally important regions tend to evolve more slowly Changes in important regions are more likely to be "deleterious" - Have a negative impact on fitness, which means they tend to be removed from the population through natural selection

Answer 74

TRansposon Directed Insertion-site Sequencing It is used to understand bacterial gene function

Answer 75

Mobile genetic elements Transposons can move around the genome through a “cut and paste” mechanism

Answer 76

They consist of a transposase gene, flanked by inverted repeat sequences that are recognised by the transposase If the transposase gene is removed, the transposon can still move and be inserted into a bacterial genome if transposase is supplied Inclusion of an antibiotic resistance gene allows mutants to be selected

Answer 77

If a gene is disrupted by the transposon, it will be inactivated If the disrupted gene is essential, the mutant will not survive Genes without insertions are likely to be essential In transposon mutagenesis we do not see mutants in essential regions of the genome

Answer 78

It can identify not just important genes, but important regions of genes

Answer 79

Have an input pool of random transposon mutants Run them through some form of stress Compare input and output pool to see which organisms with which gene survived This will tell us which genes are essential and non-essential

Sequencing and Bioinformatics Flashcards

(109 cards)