Chaudhari Flashcards

1
Q

give 2 examples of first generation sequencing and whether they are used or not

A
maxam gilbert (not used)
Sanger (used)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

give 2 examples of second generation sequencing technologies

A

illumina

ion torrent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

give 2 examples of 3rd gen sequencing technologies

A

pacific biosciences

oxford nanopore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is sanger good at sequencing

A

low volume targetted sequences (not good for genome sequencing but useful for smaller reads)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what kind of reads does illumina produce

A

many short (50-300bp) reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what kind of reads do oxford nanopore generate compared to illumina

A

fewer reads but very long reads (>10kb)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the 3 `advantages of second gen sequencing (illumina) over sanger

A

massively parallel - a single gene run generates millions of sequences
much cheaper per base
built in shot gun sequencing without a cloning step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the disadvantages of second gen sequencing technologies compared to sanger

A

library prep is expensive and slow
amplification of DNA fragments is required which can introduce biases
read lengths are quite short

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is good about third gen

A

good for finished gnomes (give n gaps) and base modifications (epigenetics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

individual reads in third gen have a quite high error rate T/F

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the applications of illumina

A

draft genome sequencing,
resequencing
functional genomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the applications of Pac bio

A

complete genome sequencing,

detection of DNA methylation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the applications of oxford nanopore

A

complete genome sequencing,
epigenetics
direct RNA-seq
metagenomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the key steps to illumina sequencing

A
  1. extract genomic DNA
  2. fragment genomic DNA
  3. add linkers
  4. add input library to flow cell
  5. Amplification (bridge amplification)
  6. sequencing
  7. image taken
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

each flow cell cluster corresponds to what

A

a separate read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

patterned flow cells make the system less prone to what

A

overclustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a patterned flow cell

A

there are individual nanowells tha contain the primers for bridge amplification. A cluster is generated within each well and cant spread outside the well so overlapping clusters do not form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is meant by phasing when referring to illumina

A

In sequencing-by-synthesis chemistry like Illumina (sorry, Solexa!) phasing is the rate at which single molecules within a cluster loose sync with each other. Phasing is falling behind, pre-phasing is going ahead and together they describe how well the chemistry is performing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

once you get above 100 bases in illumina there is a lot of confusion and phasing. this is why illumina reads are limited to being short T/F

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

phasing problems result in a reduction of quality towards the end of reads T/F

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is 2 colour illumina sequencing

A
4 bases are sequenced using only 2 colours eg red and green
1st base - green
2nd base - red
3rd base - both
4th base -non
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what are the benefits of 2 colour illumina sequencing

A

it allows simplified optics so has lower costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how do oxford nanopore sequence

A

a few bases are inserted into the pore, there is a change in electrical potential and the sequence is determined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is the highly portable version of the oxford nanopore known as

A

minION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is the promethION
essentially lots of minIONs in a suitcase
26
what is GridION
somewhere in between minION and promethION - 5 minION flow cells
27
when was the first bacterial genome sequence published
1995
28
what was the first bacterial genome sequencing project to be INITIATED
E.coli K-12
29
which were the first bacterial genomes to be sequenced and how was this done
Haemophilus influenzae and mycoplasma genitalium | shotgun sequencing
30
what does shotgun sequencing rely on
computational assembly of sequence from random clone libraries, randomly sequence part of the genome and then place together in a contig sequence
31
what is de novo assembly
process of merging overlapping | sequence reads into contiguous sequences (contigs) without the use of any reference genome as a guide
32
what is the best reference to use to order contigs
usually the most closely related bacterium with a ‘finished’ genome
33
Due to evolutionary differences between the reference and novel genome, the presence of (often mobile) repeat elements such as prophages, and the very nature of short-read assemblers, there will almost certainly be assembly errors present within the contigs.
T
34
Once the ordered set of contigs has been obtained, what is the next step
to annotate the draft genome -the process of ‘gene’ finding
35
Multi-locus sequence typing (MLST) is a widely used, | sequence-based method for typing of bacterial species and plasmids T/F
T
36
what are gaps in the draft genome from shotgun sequencing due to
- repeats
37
how can you fill in the gaps in a draft genome
PCR from the contigs
38
how were genes that are toxic to bacteria identified
Looked at data from previous experiments for multiple microbial genomes that used sanger in clone by clone approach The gaps in this sequence must have been there for a reason (if clone contained that bit of genome may be toxic to cell your growing plasmid up in) Identified compounds from these gaps that were toxic to e.coli
39
where was e.coli k12 originally isolated from
a convalescent diptheria patient in 1922 (part of the guts flora)
40
how many protein coding genes does e.coli k12 have
4288
41
how were regions of low GC content acquired in e.coli k12
horizontal transfer
42
A girl in the isle of white became infected by which strain of e.coli
O157:H7 an emergent human pathogen associated with haemorrhagic colitis (blood in diahorrea) and haemolytic uraemic syndrome (HUS), which can lead to kidney failure and is sometimes fatal
43
how much bigger is the genome of O157 compared to K12
1mb
44
when was the genome of O157 sequenced
2001
45
what is CFT073
a strain of uropathogenic E.coli (UPEC) | not as dangerous as O157
46
what is CFT073 an example of
a extraintestinal E.coli (ExPEC) - associated with UTIs
47
ExPEC can be harmless when in the intestines but become pathogens when they invade where
urinary tract, blood or cerebrospinal fluid
48
the genome of CFT073 is similar in size to O157 but the extra sequences relative to K12 are not the same as O157
T
49
how big is the core genome between K12, O157 and CFT073
3000 genes
50
how do you estimate the size of the E.coli core genome
randomise the order in which you sequence strains. count the number of conserved genes the number of conserved genes decreases as the amount of strains increases
51
how big is the core genome of E.coli expected to be
around 2229 genes
52
how do you predict the pangenome of E.coli
sequence strains and count the number of genes | The e.coli genome is effectively infinite - you will always find new gene
53
when looking for the number of unique genes across E.coli what was found
there will always be 300 genes that you havent sequenced before
54
sequencing to the draft stage is far cheaper than completion
T
55
what did the GWAS study find about host specificity in campylobacter
genes involved in B5 biosynthesis are present in strains that infect cows but not chickens
56
who discovered E.coli
Theodor Escherich
57
who discovered shigella
kiyoshi shiga
58
e.coli and shigella are distinguished on the basis of what?
motility, metabolic profile and clinical manifetation
59
E.coli will grow on lactose whereas shigella would not
T
60
E.coli are usually commensal whereas shigella are what?
obligate pathogens
61
what is the process of serotyping
• Taking a particular antigen and raising antibodies against it • See if these cross react with antigen from other strains • If they cross reaction- then those are of the same serotype -- Typing based on immune recognition of cell surface antigens
62
how were strains distinguished after serotyping
cross hybridisation
63
• Milkman (1973) began the quantitative study of E. coli population genetics by measuring the electrophoretic mobility of enzymes derived from different E. coli strains. this was known as...
multi locus enzyme electrophoresis (MLEE)
64
what is the ECOR collection
• A set of 72 phylogenetically diverse E.coli strains, chosen based on MLEE data
65
shigella has arisen on multiple occasions from E.coli, how do we know
Shigella is split up into different serotypes, these are split up within in the E.coli phylogenetic tree
66
what does multi locus sequence typing (MLST) do
involvesobtaining the nucleotide sequences of 450 bp fragments derivedfrom (typically) 6–8 housekeeping genes at distinct loci around thebacterial chromosome. ince they are likely to be under strong purifying selection, and theobserved variations are likely to be selectively neutral.
67
what was the first application of MLST
investigated the relationship between a number of EPEC and EHEC strains by analysing 7 house keeping genes
68
the radiation of clones began about 9 million years ago and the highly virulent pathogen responsible for epidemics of food poisoning, E. coli O157:H7, separated from a common ancestor of E. coli K-12 as long as .... years ago
4.5 million
69
Phylogenetic analysis reveals that old lineages of E. coli have acquired the same virulence factors in parallel, including a pathogenicity island involved in intestinal adhesion, a plasmid-borne haemolysin, and phage-encoded Shiga toxins. T/F
T
70
what type of phylogenetics is the gold standard
core genome phylogenetics | -phylogeny based ont the core genome
71
what is 16s ribosomal RNA profiling good for investigating
diversity of organisms ACROSS different species
72
16S ribosomal RNA is present in all bacteria and archaea T/F
T
73
16S ribosomal RNA has both constant and variable regions. how are these used for its analysis
constant regions are used as primer binding sites. V loops can be used to determine evolutionary relationships between strains
74
what are the caveats of 16S rRNA profiling
- primers may not be truly universal - contamination can be an issue - sequencing errors can result in the overestimation of diversity of organisms present - some organsims have multiple copies of 16S rRNA genes which vary in sequence (can result in overestimation of taxa present) - PCR bias may result in incorrect quantification of species
75
the first real application of 16S rRNA to understand the diversity was seen in what
Carl Woese 3 domain structure of life
76
what is meant by microbial "dark matter"
up to 99% of bacteria we cannot culture
77
what is metagenomic
the study of genetic material recovered directly from environmental samples
78
To comprehensively sample an environment what is required
deep sequencing
79
individual sequences can be derived from metagenomic studies how
using software such as Kraken de novo assembly can be used to piece together larger contigs from whole bacterial genomes. This is an incredibly difficult computational problem However, assembly of metagenomic data is complex, computationally intensive and error prone
80
what was found to critically impact microbiome analysis
reagent and laboratory contamination
81
the reagents used for DNA extraction are routinely contaminated with microbial DNA sequence T/F
T
82
A paper reported anthrax in samples taken from the subway. what was wrong with this conclusion.
sequenced the sample and identified the most common hit. pathogenic and non pathogenic strains are very similar and the proper toxins were not found to be present so it wasnt proper anthrax
83
in single cell genomics how can individual cells be isolated
optical tweezer, cell sorting FACs, micropipetting or laser microdissection
84
how is the genome in single cells processed before sequencing
single copy of the genome is PCR amplified, sequenced and assembled
85
what is iChip a method of
culturing bacteria so that there is enough material for DNA sequencing
86
how is iChip carried out
* Samples are diluted so on average one bacterial cell ends up in each chamber in the iChip * The chambers are filled with molten agar and covered with a semi-permeable membrane * The iChip is placed back in the native environment * This allows a colony to grow from a single cell, which provides enough material for DNA sequencing
87
what was the new antibiotic identified by iChip called
Teixobactin
88
early studies used what to investigate bacterial diversity
serotyping, DNA-DNA hybrisation and MLEE
89
sequencing of what is useful for identifying the bacterial species within a sample
16S ribosomal RNA
90
metagenomics, single-cell sequencing and iChip allow additional sequence data to be obtained from unculturabel micoorganisms T/F
T
91
what was the first pathogenic E.coli to be characterised
enteropathogenic E.col (EPEC)
92
what type of secretion system does EPEC encode
Type III - which it used to inject host cells with effector molecules, allowing it to attach to the gut wall
93
whats the difference between EPEC and EHEC
EHEC encode a shiga tocin which is associated with more severe disease
94
what is an example of an EHEC
E.coli O157:H7
95
what are enteroaggregative E.coli (EAEC)
mild pathogens associated with diarrhoea that stick together in a characteristic stacked brick conformation to aid survival in the gut
96
what country did the e.coli outbreak in 2011 occur in that mainly affected young women
germany
97
what did the young women in germany in 2011 mainly suffer from
haemolytic uremic syndrome
98
what did the find when they looked for the commonalities of all the young women in germany affected by E.coli strain
they all had ate cucumbers recently
99
what mistake did german health authorities make
linked O104 serotype to cucumbers imported from spain | was actually serotype O157:H7 originated from beansprouts
100
what did genomic sequencing by BGI confirm
O104:H4 serotype has some enteroaggregative E. coli (EAEC or EAggEC) properties, presumably acquired by horizontal gene transfer
101
what caused the outbreak in germany and surrounding countries in 2011
strain of E. coli of the serotype O104:H4, that was unusual for having characteristics of both enteroaggregative E. coli and enterohemorrhagic E. coli. The strain has a number of virulence genes typical of enteroaggregative E. coli, including attA, aggR, aap, aggA, and aggC, in addition to the Shiga toxin variant 2
102
all sequence data from the german 2011 outbreak was available during the course of the outbreak T/F
T
103
the two outbreak strains in germany 2011 were identical to what at the core genome level showing it was an enteroaggregative e.coli
55989
104
PacBio sequencing of the O104:H4 was not possible during the outbreak T/F
T