18. Genome-wide association studies and how we can use it to better understand bacteria Flashcards
What is a GWAS?
- Genome wide association study
- Studies a the entire genome of a large group of people.
- Searching for small variations in the population (SNPs)
What is a GWAS used for?
- To identify genetic variants associated with a specific trait.
- This is done by comparing the genomes of people with or without the trait.
- These genetic variants can include mutations, insertions, deletions, modifications.
- The variants are present at higher frequency in cases than the controls.
What sequencing technology does GWAS use?
- They used to use DNA microarray analysis.
- Now GWAS uses whole genome sequencing
What are the variants identified through GWAS linked to?
- Variants are linked to a disease
- Or they are in the same haplotype as a variant associated with a disease.
- These are present at higher frequency in the cases then the controls.
What is a haplotype?
A group of polymorphisms inherited together.
How is statistical analysis used in GWAS?
- It is used to determine how likely a variant is associated with a trait.
- The P-value indicates significance of the difference in frequency allele tested between cases and controls.
What is the output of a GWAS?
- A Manhattan plot
- This compares -log10 P values plotted against the position in the genome.
What did one of the first successful GWAS studies show?
- It identified the genetic basis of macular degeneration.
- It found a tyrosine to histidine change at position 402 in the complement factor H gene.
- This factor H variant has weaker affinity to oxidise lipids so there is less C3b-iC3b generation.
- This causes constant background complement activation.
- This causes retinal epithelial cell damage.
How can we use GWAS to investigate bacterial virulence, antibiotic resistance and predict outcomes of infectious disease?
- GWAS requires large collections of genome sequenced bacterial isolates.
- We can use these to interrogate bacterial phenotypes.
- These phenotypes could be pathogenic or to do with resistance.
Is there enough whole genome bacterial data to use in GWAS?
- There has been a massive increase in sequencing due to advances in technology and reducing costs.
- The amount of sequence data is doubling every 18 months.
- There is a lot of sequence data available however only 20 bacterial species make up 90% of this data. The other 10% is made up of 100s of species
- So most GWAS are restricted to these pathogens as they are the ones with enough data for a GWAS.
- These 20 are all human pathogens.
- The data is skewed though as some pathogens like E coli have a lot more data then N gonorrhoeae.
What are the prerequisites for a successful bacterial GWAS?
- The trait you are looking at needs to have a testable phenotype like toxicity, virulence or AMR.
- The whole genome-sequenced bacterial isolates. The more closely related the strains are the easier it is to identify genetic mutations associated with the phenotype.
- Understanding the genetic variation and population structure in bacterial strains.
What bacterial phenotypes can be tested for in GWAS?
- Continuous varying quantitative phenotype or a binary phenotype.
- Whatever the phenotype there needs to be a good high throughput assay and enough isolates to test.
- GWAS can also detect the effect size of a variant.
What is the effect size of a variant in GWAS?
- It is a measure of the correlation of the variant with phenotype.
- Mutations/variants in a key regulator could completely cause the effect seen.
- Some mutations in other genes could have a more modest effect on the phenotype.
What is linkage disequilibrium?
- In humans genetic recombination and chromosome segregation causing newly occurring mutations to be linked to neighbouring allele as part of a haplotype.
- This linkage lasts until recombination breaks the linkage.
- The extent that any 2 alleles within a population are contained on the same ancestral haplotype block of DNA. This is linkage disequilibrium.
- Mixing alleles between different genetic backgrounds is important for distinguishing causal loci from linked mutations.
- Linkage disequilibrium has a stronger effect on bacteria due to asexual reproduction and the clonal nature of the population.
What are homoplasious mutations?
- Mutations that occur repeatedly at the same site.
- Bacterial strains could share the same mutation at a particular genomic location not through common ancestry but because the variant arose independently.
- This introduces variability into the population
How can homoplasious mutations be introduced into bacteria?
- Horizontal gene transfer.
- Recombination
- Recurrent mutations.
What are recurrent mutations?
- The same mutations that occur independently
- These are usually due to selection pressure like AMR.
Why is considering population structure important in bacterial GWASs?
- Bacterial populations are clonal.
- In the absence of recombination all fixed genetic variants will be passed onto descendants and they will be in linkage disequilibrium with other mutations in that lineage.
- This means linkage disequilibrium has a strong effect on bacterial populations.
- Separation of causative variants and passively linked loci is a problem in association studies.
What is linear mixed modelling?
- a bioinformatics approach to deal with the bacteria population structure issue.
- It can control for the effects of relatedness as it captures population structures more accurately.
- Helps identify mutations associated with the phenotype and not the mutations from linkage disequilibrium.
- It pin points locus specific effects where possible and identifies lineage level differences.
What is VISA?
- Vancomycin intermediate resistance in S. aureus.
- These are S. aureus with a raised MIC but not fully resistant.
- It is caused by changes in multiple genetic loci to make the cell wall thicker.
- Tricky to define
How have GWASs been used to find genetic associations with VISAs?
- GWAS examined 49 VSSA and 26 VISA.
- The phenotype of vancomycin resistance was determined with microdilution tests, E-test and PAP-AUC.
- found around 55,000 SNPs across the strains.
- Lots of the SNPs were fixed in linkage disequilibrium so don’t help with VISA associations.
- 1 SNP in rpoB was highly significantly associated with increased vancomycin MIC.
- This SNP was at codon 481 of rpoB (H481Y/L/N).
- It had previous been associated with vancomycin resistance in an independent study.
What is PAP-AUC?
- Population area under the curve.
- Plot lines using an increasing amount of vancomycin and seeing how much bacteria is left.
- Calculate the area under the curve to calculate the MIC.
- Bigger area = higher MIC
Why do Manhattan plots use -log10 P?
- -log10 P is plotted on the y axis and genome location on the x axis.
- It is used to display significant association in an easier way.
- You transform the P value using -log10 so that a larger value indicates a more significant association.
- For example if a significant p value following GWAS was p<5x10-8 the -log10 (P) would be 7.3.
- This makes it easier to interpret the P value
What is a quantile-quantile (QQ) plot?
- It compares the distribution of observed p values against expected p values distribution under the null hypothesis.
- If the 2 distributions are similar the points should fall on on diagonal line.
- If a SNP lies above the significance line then P<0.05.
What are Benferroni corrections?
- It is applied to the significant threshold.
- It is used to reduce the incidence of false positives.
What did the identification of rpoB’s association with vancomycin resistance through GWAS show?
It was a proof of concept that GWAS can be used to identify mutations associated with resistance in bacteria.
How was GWAS used to find genetic associations with AMR in M. tuberculosis?
- The mechanisms of resistance in M. tb are not well understood.
- The aim was to identify biomarkers of drug resistance.
- Bacterial genomes with different resistance phenotypes were sequenced and GWAS, phylogenetic methods and statistical tests applied to identify variants in the genome that are consistently associated with resistance.
- 116 newly sequenced and 7 previously sequenced M. tb isolates were tested.
- Many known resistance genes were identified and 4 novel genetic associations were found.
What new genetic associations with resistance in M. tb were identified using GWAS?
- These were novel genes that were associated with resistance
- ponA
- rpoC
- murD
- ppsA
How were phylogenetic convergence tests used to identify resistance genes in M. tuberculosis?
- Used evolutionary convergence to develop a phylogenetic convergence test using the genome of the 123 M. tb isolates.
- It was used to look for specific mutations with a higher frequency in the resistant branches vs the sensitive branches.
- The significance of the difference was examined for each candidate target of independent mutation in its distribution relative to the distribution expected based on observed mutations across the phylogeny.
- Found mostly C to G mutations in resistance genes.
- These occurred mostly independently and were associated with resistance.
- It detected all 11 known resistance determinants as significant target for independent mutation.
- It also found evidence of positive selection in an additional 39 genomic regions in resistant isolates.
- 11 of these had annotated function including ponA1.
What is evolutionary convergence?
The repeated and independent emergence of resistance-associated mutations at specific loci or genes.
What does evidence of positive selection in phylogenetic test show?
- It shows that the mutations is providing a selection advantage.
- This leads is a fitness advantage in the presence of an antibiotic.
What is ponA1 and how can is cause resistance in M. tb?
- A gene that was found to be associated with resistance in M. tb through GWAS and phylogenetic tests.
- It is important in peptidoglycan homoestasis.
- 1095G>T in ponA1 confers a fitness advantage in the presence of the anti-TB drug rifampicin.
- the mutation is close to the PonA1 transpeptidase domain catalytic site.
- This SNP may inactivate the enzymes as a ponA1 deletions shows similar rifampicin resistance
How can genetic association like ponA1 lead to rifampicin resistance in M. tb?
- Stable drug resistance maay evolve through a complex process to remodel the cell wall.
- This remodelling means rifampicin cannot enter or cross the cell wall.
What are the 3 main virulence phenotypes of S. aureus?
- Adhesion like fibronectin binding proteins
- Toxicity like alpha toxin
- Immune evasion like protein A
How is S aureus pathogenicity regulated?
- 2 compartment systems like Agr.
- Transcription factors like Sar family
- ClpXP protease system
- Small regulatory RNAs in the Rnome
How does the Rnome regulate S. aureus pathogenicity?
It regulates virulence genes quickly in response to environmental ques.
How can GWAS be used to understand toxicity in S. aureus?
- There are lots of toxins in S aureus.
- We wanted to look at gross cytolytic activity of all these bacteria in clinical isolates.
- Use GWAS to understand toxin regulation and mutations in these genes
What clinical isolates were used in the GWAS WGS for understanding toxicity?
- There were 2 sets of data.
- 1 was a broad set of 134 clinical isolates from blood stream, SSTI or skin/nasal colonisation.
- The second set was from a single patient
How was toxicity phenotype determine in the S. aureus toxicity GWAS?
- The S. aureus was grown in broth.
- The toxins are secreted and are in the broth.
- Incubate the broth supernatant with host cells that express the receptors that toxins interact with.
- This includes RBC, THP-1 cells and neutrophils.
- Measure cell death as a read out of toxin production.
What are THP-1 cells?
A human monocytic leukaemia cell line
What did the cytotoxicity of the different S. aureus isolate show?
- High T cell death = high toxicity
- The sample from the single patient showed high T cell death in the nasal and skin isolates and low T cell death in the blood stream isolates.
- The broad range of sample showed high T cell death in SSTI and carriage isolates and lower T cell death in the blood stream isolates.
- This shows there is a shift in toxin production and blood stream S aureus is less toxic.
- This was unexpected.
How did GWAS identify genetic associations with regulating toxicity in different S. aureus infections?
- There were lots of SNPs associated with toxicity
- Using transposon mutant libraries you can quickly identify which of these SNPs are actually associated with toxicity.
- These transposons inactivate targeted genes to see if it has an effect on toxin regulation
- 6 novel genes were found that when inactivated lead to reduce toxin production.
What is a big limitation of GWAS?
- If there are lots of SNPs there can be lots of false positives due to linkage disequilibrium.
- This means a mutation associated with toxicity could be in linkage disequilibrium with 10/15 other mutations
- This makes the causative association hard to identify
What 6 novel genes were found in S. aureus through GWAS that were associated with toxin regulation?
- agrB
- rsp
- flaK
- clpC
- sucD
- rpsA
What did GWAS into S. aureus toxicity show?
- Toxins are neutralised by some serum components but are still expressed in serum.
- The relative fitness of these isolates in media and in serum were examined.
- It was found that low toxin bacteria are more fit in serum.
- This is because high toxin bacteria over produce toxins in neutralising serum to keep themselves alive.
- This has a fitness cost so they survive less well in the blood.
How can GWAS be used to predict toxicity of S. aureus in clinical isolates?
- Use MRSA and MSSA bacteraemia isolates.
- The isolate belonged to 2 clinical groups CC22 and CC30.
- All genomes were sequences and toxicity and biofilm formation phenotypes observed.
- CC22 was found to be extremely toxic with little variation
- CC30 had a much broader range of toxicity with interclonal and intraclonal variation.
- Biofilm formation followed a similar pattern.
- GWAS identified 5 novel genes that regulate biofilms formation and 3 that regulate toxicity.
- GWAS also found previously identified genes (good evidence it works)
Why is biofilm formation clinically relevant?
Biofilms have a higher resistance to immunity and treatment
Does biofilm formation or toxicity levels of S. aureus predict patient outcomes?
- Yes
- It has a misclassification of 22%.
- Using GWAS and all data is good at predicting outcomes for patients.
- In CC22 toxicity and biofilm formation are principle predictors of outcome but in CC30 they are not.
- These can be identified in patients using GWAS.
How is capA associated with patient outcomes from S. aureus infection?
- There are 16 loci selected by the GWAS model that can predict mortality.
- Only 1 capA is a known virulence factor.
- Polymorphisms in capA gene of the capsule biosynthetic locus was associated with patient mortality.
- It was shown these strains have lower killing but higher reactivity.
- Out of 24 patients 12 survived.
- Mutations P to S turns off capsule expression which is associated with mortality
How can GWAS help us understand bacteria?
- It can be used to understand the genetic basis of important phenotypes like antibiotic resistance.
- It can be used to identify novel effectors of known virulence phenotypes.
- It can be used to examine how know virulence factors contribute to disease.
- It can be used to predict patient outcomes.
- It can be used to identify novel virulence traits.
What are the benefits of GWAS?
- They have been very successful in identifying novel variant-trait associations.
- They can lead to the discovery of novel biological mechanisms.
- Findings of GWAS can have diverse clinical applications
- They can provide insight into ethnic variation of complex traits.
- Good to study low frequency and rare variants
What are the limitations of GWAS?
- Due to the multiple tests GWAS requires a high level of significance. This is typically is done with bonferroni corrections.
- GWAS don’t necessarily pinpoint causal variants and genes.
- GWAS cannot identify all genetic determinants of complex traits.
- They have largely been unsuccessful in detecting epistasis in humans
What is the MIC of a VISA?
4-8µg/ml