SAQ Flashcards
when thinking about sequencing platforms what is normally the trade off between the different generations
trade off between producing lots of reads (short) or long reads but not many
read depth vs length of reads
what is a flow cell cluster in illuminca
each cluster corresponds to a separate read.
has been amplified by bridge amplification
what can be used to increase cluster density and how
patterned flow cells
flowcells with nanowells in a distinct pattern. Each nanowell contains DNA probes to capture DNA strands for amplification but the regions within wells do not contain probes and thus are free of reads.
+ it reduces the problem of adjacent clusters overlapping
+ allows you to control the sizing of clusters
+ position of well is known so cluster can be easily identified
+ packed very densely so can get out more sequence data
- generates duplicated sequences
- Why does the quality of a read decrease over its length in illumina?
PHASING
- illumina relies on sequence by synthesis approach in which errors can occur
- usually 4 dNTPs washed over, one incorporated and terminator. Terminator then removed and another dNTP can be incorporated
- phasing occurs when this terminator is not successfully removed. The next nucleotide cannot bind so fro now on this DNA sequence will be one base behind the rest in the sequencing.
- Over time these errors accumulate and pollute the fluorescence signal
- also prephasing - terminator cap defective so one fragment can go ahead and incorporate 2 nucleotide in one cycle
2 colour illumina sequencing
also known as 2 channel sequence by synthesis
generates data faster that 4 colour whilst maintaining quality and accuracy
- only 2 images per cycle are required
- CONS:
incorrect base calls because of phasing lead to a rising pollution of the light signals over time, making it more difficult to differentiate the bases and to interpret the base quality
No colour could also mean that no base has been incorporated
list the major properties of the E.coli K12 genome
most frequent strain in labs
- commensal organism that survives in the lower intestine
- able to survive in culture only under very specific conditions
- unable to survive at all in gut
- 4288 protein coding genes
- regions of low GC content
describe whole genome shotgun sequencing and how gaps in sequence can be informative of regions with potential biotechnological applications
- shotgun sequencing requires shearing of DNA, selection for specific size fragments and placing in a vector insert into bacteria. inserted fragments can then be sequenced using sanger with primers that overlap the vector backbone. then mapped onto a reference genome and assembled into contigs
- Gaps in genome following seqencing must mean that insertion of these sequences into plasmid in bacteria leads to their death (toxic)
- can identify these gaps - sequences/genes that are toxic to e.coli by interacting with the replication initiator DnaA
what was found when the EHEC O157:H7 genome was compared to K12
- genome was 1Mb bigger than K12
- O islands - only present in O157 eg type III secretion system/ shiga toxin
- K - islands - regions only present in K12
what kind of e.coi is CFT073
UPEC - uropathogenic e.coli
assocaited with UTIs
harmless in intestines but become pathogens when they invade urinary tract, blood or CSF
genome similar size to O157 but the extra sequences relative to K12 are not the same as O157
Define the terms “pangenome” and “core genome” and how they can be estimated
Core genome represents all the genes present in all the strains of a species. Typically estimated by comparing WGS of multiple genomes. As more genomes are compared this number decreases. Rasko paper estimated the E.coli core genome to be around 2200 genes which are mainly involved in metabolic processes
pangenome is the entire gene set of all the strains of a species including the core genome and the variable/accessory genome
broad sample of the diverse pathogens that
comprise this speciesx. doesn’t come to a plateu showing that the e.coli genome is effectively infinite (open) - suggests it must still be evolving
what was the estimate of the number of unique genes per e.coli genome sequenced
300 genes
since advent of illumina more draft genome sequences have become available why arent they finished
bc finishing and annotation remains a laborious process
what can you do when you have lots of genomes for a bacterial species
bacterial genome wide association study
eg study looking at campylobacter. Sequenced genomes of campylobacter from a whole range of different hosts (chickens, cows, birds)
looked at phylogeny between strains
not one evolutionary lineage is associated with one host
GWAS found genes from vitamin B5 synthesis pathway (PanBCD) are present in bacterial strains that infect cows but not chickens
- (isolates from cattle grew better, on average, in a low vitamin B5 environment than isolates from chickens)
** gene Cj0299 which encodes an enzyme giving resistance to beta lactam antibiotics found at highest frequency in cattle and was rarest in bird isolates **
Explain how multiple genome sequences can provide an insight into genome evolution and horizontal gene transfer
By comparing multiple genomes of the same species a core genome can be derived. Different strains accessory genomes can then be identified that give rise to their overall phenotype. Some of this accessory genome can be derived by horizontal gene transfer recently in which regions of the genome would have abnormal GC content due to the fact that it hasnt gone through ameriolation yet.
Describe how the development of new technologies has influenced our understanding of E.coli/Shigella diversity
before the advent of molecular biology: serotyping: based on the immune recognition of cell surface antigens - bacteria of the same serotype cross react to the same antibodies (doesnt correlate v well with similarity on genetic level)
hybridisation of different strains to see how similar they are at a molecular level - found that E.coli and shigella are comparable - shigella tended to be more diverse
MLEE - multi level enzyme electrophoresis: characterises organisms depending on the electrophoretic mobility of its proteins. - allowed construction of ECOR collection
comparison of gene sequences found that shigella is within the evolutionary diversity of e.coli and has arisen on multiple occasions from e.coli - some properties shared were examples of convergent evolution
MLST - multi locus sequence typing- sequence multiple genes and compare (usually housekeeping 400bp chunks)
- found lineages of e.coli have acquired the same virulence factors in parallel including a pathogenicity island involved in intestinal adhesion and phage-encoded Shiga toxins.
- Sequence 8 HK genes in 46 shigella strains representing each of the 4 serotypes
Shigella strains are well distributed within the diversity of E. coli
presence of three major clusters and five forms not closely related to any other suggests that the Shigella phenotype has arisen eight times
what can be used to compare the diversity across species
16S rRNA sequencing
how can 16S rRNA profiling be used to investigate microbial diversity
16s rRNA can be used to investigate microbial phylogeny due to the fact that all microbes have it bc of its importance in translation. evolves slowly due to its fundamental function in cell. Has variable regions that vary more and can be compared to build a picture of the phylogeny
- design primers to conserved regions that span these variable regions, amplify up and sequence
- can be used to determine the evolutionary relationships between strains/species of bacteria
what are the caveats of 16s rRNA profiling
- primers may not be truly universal
- contamination may be an issue (link to paper about contamination in DNA extraction kits)
- sequencing errors can result in overestimation of diversity of organisms present
- some organisms have multiple copies of the 16s rRNA gene which vary in sequence (overestimation of diversity)
- PCR bias may result in incorrect quantification of species
what was the first major application of the use of 16s rRNA to study the diversity of organisms
Carl woese 3 domain tree of life
what can the microbial dark matter problem also be called
great plate count anomaly - observation that most of. the microbes seen in the microscope cannot currently be grown
what is metagenomics
study of genetic material recovered directly from environmental samples
Tells you what genes are encoded by and what bacteria in your sample
what did craig venter to double the size of GenBank
sample from sea, genome extracted, fragmented and sequenced using sanger. Revealed some “dark matter”
how can metagenomic techniques be used to study the human microbiome
swabs can be taken from different individuals and sequenced for different areas of the body
Revealed link between health and bacteria in body
Obesity: reduced ratio of Bacteroidetes to firmicutes
what is an example of when metagenomic sequencing results can be misinterpreted
went round collected swabs from New york and sequenced using shotgun sequencing. claimed plague was present when in fact there were no reads mapped to the yMT gene (toxin) - in first paper they claimed it was present but that was a mistake.. they were actually looking for related plasmids
when sequencing one strain - they found most closely related in database was anthrax so concluded anthrax was present - it wasnt ( no evidence of pIcR-SNP - a defining feature of anthrax)