GWAS and How We Can Use It To Better Understand Bacteria Flashcards
What is Genome Wide Association Study
A GWAS tests genetic variants across the genome of many individuals to identify associations between specific genetic loci (mutations, insertion, deletions) and phenotypic traits, including diseases
What is the key principle behind GWAS
Genetic variants that contribute to a trait or disease will occur more frequently in individuals with that trait (cases) than in those without (controls)
What technologies are used in GWAS
- Whole Genome sequencing
- DNA microarrays
Both are used to identify common variants across individuals
How is data analysed in GWAS
Statistical analysis compares allele frequencies between cases and controls. Variants significantly more frequent in cases are flagged as potentially linked to the trait
What is the significance of the p-value in GWAS
The p-value reflects how likely it is that the association between a genetic variant and the trait is due to chance. Lower p-values suggest stronger associations
What is a haplotype and its role in the GWAS
A haplotype is a group of genes inherited together. A variant within a haplotype associated with a disease may appear frequently in cases, even if its not the direct cause.
How can GWAS be used for investigating bacterial virulence, antibiotic resistance, outcome predictions in infectious diseases
Can be used to investigate:
- What makes some bacterial strains more dangerous (virulent)
- Why some are resistant to antibiotics
- Whether we can predict clinical outcomes based on bacterial genetics
How are these genomes sequences first collected
A lot of bacterial samples from different infections have its whole genome sequenced - these are the genotypes (the genetic content of each bacterial strain)
How is this genetic data the defined
By its phenotype:
- The pathogenic phenotype/genetic basis of pathogenicity. This would help in identifying virulence factors - the gene responsible for the ability to cause disease.
- antibiotic resistance/genetic basis of resistance. Which strains are resistant to antibiotics. Which genes or mutations are responsible. Can we predict resistance from the genome without growing bacteria in the lab.
How does GWAS play a role bacterial infections
GWAS takes the genotype and phenotype data and runs statistical comparisons to find associations between specific genetic features:
- SNPs
- Genes
- Plasmids
and the traits of interest:
- Resistance
- Severity
- Immune evasion
Why is WGAS powerful
Discovery of new resistance or virulence genes
Real-time surveillance during outbreaks - tracking the spread of a strain of a resistant clone
Personalised treatment decisions - choosing the best antibiotic based on the bacteria’s genome
How is the genome database for these bacteria developing
Pace of sequencing is still increasing; amount of sequence deposited roughly doubles every 18 months whereas with
continued technological development the
cost per sequence is decreasing
What are prerequisites for a successful GWAS
- A testable phenotype: this could be binary (yes/no), or quantitative (MIC/how much toxin does this strain produce)
- WGS bacterial isolate: the more related these strains are the less interference of population structure
- Phenotype must be scalable: its hard to test a phenotype on thousands of strains so GWAS tends to focus on high-throughput phenotypes - things you can measure quickly and automatically in the lab
- Effect size: a measure of how strongly a genetic variant is associated with a trait (if a single mutation completely explains a trait - it has a large effect size which is ideal for GWAS)
What does it mean by less interference of population structure
There are fewer false associations in the GWAS results caused by genetic relatedness between bacterial strains
You would minimise background noise from unrelated mutations that have nothing to do with the trait you’re studying
You are more likely to find the true genetic causes of the phenotypes you’re testing
What is linkage disequilibrium
LD refers to how often two genetic variants are inherited together on the same stretch of DNA (haplotype block)
In humans, recombination shuffled genes during reproduction, breaking up these haplotypes over time
So in human GWAS, causal variants can often be distinguished from those physically close to the variant because of recombination.
WHy are GWAS results more challenging in bacteria compared to humans
Bacteria reproduce asexually, so they don’t recombine DNA as frequently as humans.
This means new mutations stay linked to large chunks of the genome, making it harder to pinpoint the actual causal mutation.
Without frequent mixing (recombination), many traits appear genetically linked just due to shared ancestry, not actual causation.
How does recombination help identify the true cause of a trait in GWAS?
Recombination breaks up long DNA blocks into smaller ones, allowing scientists to separate the causal variant from other nearby, non-causal mutations.
In humans, this happens naturally and frequently.
In bacteria, recombination is rare, so it’s harder to rule out false positives (mutations that are inherited together but not functionally linked to the trait).
How does population structure interfere with GWAS in bacteria
Closely related bacteria share large chunks of identical DNA.
If a trait is common in one lineage, you may mistakenly associate it with many shared mutations, even if only one is responsible.
This is why understanding genetic backgrounds and controlling for population structure is crucial in bacterial GWAS.
What is a homoplasious mutation
A mutation ath occurs repeatedly at the same site; e.g., bacterial strains could share the same mutation at a particular genomic location not through common ancestry but because the variant arose independently
What are the 3 mechanisms by which homoplasious mutations can be introduced into the genomes of bacterial populations
- HGT
- Recombination
- Recurrent mutations
Why is population structure important in bacterial GWAS
Because bacteria reproduce clonally, meaning all genetic variants in a lineage are inherited together
As a result it is difficult to tell if a mutation causes a trait or is just linked due to a shared ancestry
Without proper control for population structure, you risk identifying false associations
How does lack of recombination affect GWAS
Due to the lack of recombination in bacteria, all fixed mutations in a lineage are passed on together in a linkage disequilibrium.
If a phenotype is present in a lineage, many linked mutation may appear associate, even if only one is causal
What is a Linear Mixed Model and how does it help in GWAS
LLMs are statistical models that account for relatedness between bacterial strains
They help control for population structure by modelling the background genetic similarities across strains
This improves the ability to detect true associations between specific lovi and phenotypes
What can LLMs help identify in GWAS
- Locus specific effects: mutations truly linked to the phenotype
- Lineage-level differences: broader patterns seen in entire strain groups
- Helps separate trait-causing mutations from those just carried by related bacteria
What is Vancomycin-Intermediate S. aureus (VISA)
VISA is a form of S. aureus with intermediate resistance to vancomycin, a last line antibiotic
Resistance evolves gradually through mutations in multiple genes
What did the VISA GWAS study investigate
- To identify genetic variants associated with vancomycin resistance
- It was compared with 49 vancomycin-sensitive (VSSA) and 26 VISA strains
- It analysed over 50,000 high quality SNPs across the 74 strains.
Why were many SNPs in the VISa GWAS not useful
Many SNPs were ‘fixed’ and lineage specific, meaning they were shared among closely related bacteria, not necessarily related to resistance
These are structured by population, so they confound association analysis - they don’t distinguish between resistant and sensitive strains
What was the key SNP identified in VISA GWAS
A non-synonymous mutation at codon 481 of the rpoB gene
Strongly associated with increased vancomycin MIC
Previously shown to contribute to vancomycin resistance in other studies
Why is it important to study resistance mechanisms in M. tb
Mechanisms of resistance are still incompletely understood
Identifying biomarkers of drug resistance can help with faster diagnosis and better treatment
Genome sequencing and GWAS can link genetic mutations to drug resistance
What was the design of the GWAS for M.tb drug resistance
Compared genomes of resistant vs. sensitive strains
Applied GWAS, phylogenetic analysis, and statistical testing to find variants associated with resistance
What is evolutionary convergence in antibiotic resistance
- It refers to the independent emergence of the same resistance mutation in different M. tb lineages
- Suggests that certain mutations provide a strong selective advantage under antibiotic pressure
What is the purpose of the phylogenetic convergence test (phyC)
A statistical method used to detect repeated mutations that appear more often in resistant strains than sensitive ones
Helps identify true resistance mutations versus background variation
What novel findings did the GWAS in M. tb reveal
Found positive selection in 39 additional genomic regions among resistant strains
Of these 11 had known functions, potentially involved in resistance mechanisms
What role does the ponA1 gene mutation play in rifampicin resistance
ponA1 involved in peptidoglycan homeostasis
This mutation in ponA1 conferred a fitness advantage in the presence of rifampicin
This is located near the transpeptidase catalytic site, suggesting it may disrupt enzymatic activity
What do these results suggest about the evolution of M. tb resistance
- Drug resistance can evolve through a complex, stepwise process
- May involve cell wall remodelling and compensatory mutations for fitness under drug pressure
- Shows that resistance is not just about target mutations but may involve multiple pathways
What are the three main virulence phenotypes of S. aureus
- Adhesion
- Toxicity
- Immune evasion
What is the principle behind S. aureus’ pathogenicity
Its pathogenicity is tightly regulated by multiple layers of regulatory systems. These systems make sure that virulence factors are only expressed when needed, helping the bacteria survive in different environments (skin, bloodstream, or inside immune cells)
What is the Two-Component System (TCS)
- These are signal transduction systems that allow bacteria to sense and respond to environmental changes
What does a typical TCS have
1) A sensor kinase (membrane bound, detects an external signal)
2) A response regulator (activates or represses gene expression)
What are the different TCSs in S. aureus in control of
It has about 16 TCSs that control different aspects of behaviour like:
- Virulence
- Antibiotic resistance
- Biofilm formation
- Metabolism
What is an example of a virulence TCSs
Agr system - Accessory Gene regulator
- Master regulator of virulence in S. aureus
- Controls expression of toxins, enzymes, and surface proteins
- Works through quorum sensing - detecting population density and activating virulence at high cell density
What are transcription factors
Proteins that bind DNA and regulate the transcription of specific genes - can either activate or repress gene expression
What are the 3 transcription families in S. aureus:
- Sar family
- CodY
- SigB
What does the Sar family include
SarA, SarR, SarS
Regulates agr and many virulence genes directly
Can fine tune gene expression in response to environmental cues
What does CodY transcription family include
- Represses virulence genes under nutrient-rich conditions
- Acts as a metabolic sensor- when nutrients are scarce, repression is lifted and virulence genes are expressed.
What does the SigB family include
An alternative sigma factor that helps the cell respond to stress
Promotes survival under harsh conditions
Also controls genes involved in virulence and biofilm formation
What is the CIpXP Protease System
This is a protein degradation system
CIpXP is an ATP-dependent chaperone that unfolds proteins, and CipP is the protease that degrades them
What is the role of CIpXP in S aureus
It helps maintain protein quality control, especially under stress
It also degrades regulatory proteins, affecting the stability and activity of transcription factors and other regulators
Impacts stress response, virulence, and antibiotic resistance
What are sRNAs (Small regulatory RNAs)
These are short non-coding RNA molecules that regulate gene expression at the post-transcriptional level.
What are sRNAs function in S. aureus
Bind to mRNA to block translation or promote degradation
Can also stabilize certain mRNAs depending on the situation
Important for fine-tuning gene expression in response to stress, growth stage, or environmental signals
What is an example of an sRNA
RNAIII: this is an effector of the Agr system - regulates toxin and adhesin expression by base pairing with target mRNAs