GWAS in prokaryotes Flashcards
GWAS
- define
association analysis performed w/ panel of polymorphic markers adequately spaced to capture most of linkage disequilibrium info in entire genome
Study designs
Family based
OR
Case control
Linkage disequilibrium
Genes close together in genome
- closer they are = higher the linkage
= more likely to be inherited together
Study design
- cases
- controls
> those with the phenotype of interest e.g. disease
presumed to have high prevalence of susceptibility alleles
> those w/out phenotype
presumed to have lower prevalence of such susceptibility alleles
case + controls ideally similar in majority of other factors
Misclassification bias
study participant categorised into incorrect category
-> alters the observed association or research outcome
what can identification of susceptibility variants lead to?
*novel biological insights
-> clinical advances
-> therapeutic targets
OR
biomarkers
OR prevention
*improved measure of individual aetiological processes
-> personalised medicine
–> diagnostics
OR
prognostics
OR
therapeutic optimisation
Genotyping
- what is it?
Looking for a nt variation associated w/ a given phenotype
e.g. GC change associated with a disease
AT means you don’t have disease
Genotyping
- process
- Extract, amplify + fragment DNA
- Either:
Microarray
OR
Sequencer - Genotype calling
- SNP genotype
Genotype calling
Determining genotype for each individual
- typically only done for positions in which a SNP or a ‘variant’ has already been called (=estimated)
Significance of hits
- Contigency tables (Fisher’s Exact Test)
Gives a p values for the significance of the SNP being associated w/ disease
Sum all probabilities for observed + all more extreme values with same marginal totals to compute probability of null hypothesis
Does the affected or control group exhibit Population Stratification?
- what is this?
- what can it cause?
- how is this controlled?
When subpopulations exhibit allelic variation because of ancestry
Can cause false +ves if there are SNP differences in the case + control population structures
Control for this by testing control SNPs for general elevation in X^2 distribution between cases + controls
Associated haploblocks
Linkage disequilibrium organises genome into haplotype blocks
Haplotype block
region of genome where there’s little evidence of a history of genetic recombination
contain only a small number of distinct haplotypes (group of alleles inherited together from 1 parent)
Bottom-up approach
Starts w/ DNA sequence
-> tests effect on phenotype
Top-down approach
Starts w/ phenotype + associates it w/ particular genomic elements
(by 2010 had large bacterial collections)
Campylobacter
- causes
most common cause of bacterial food poisoning in developing countries
meat gets contaminated
- > people eat meat
- > get ill
Campylobacter
- who carries it?
humans don’t
- get ill and then bacteria goes away
chickens, sheep, cows + birds carry the bacteria
Old signals vs New signals of adaptation
Old
= host-associated clonal complexes
New
= host-associated mobile elements
GWAS study method
- sample lots of isolates from cows + chickens
- sequence genomes
- divide into 30bp fragments
( have 1 for every position in genome) - sort by cow or chicken origin by:
looking for fragments overrepresented in 1 org
Difference in overrepresentation of a gene fragment in 1 species
- compare to…?
- shows…?
compare observed vs expected by descent
which bit of DNA might be adaptive
ST-45 complex
- how many host-associated words?
9034
Vitamin B5 synthesis gene
Present in cow campylobacter
NOT in chicken campylobacter
B5 found in grain not grass
Cows eat grass
-> need to synthesise own B5
Campylobacter survival through the food chain
ST-45 most prevalent in chicken carcass or meat
ST-21 increasing prevalence from farm -> meat -> clinical
ST-45 meat + ST-21 clinical have same prevalence of 8%
Campylobacter survival through the food chain
- deletion mutants have altered functions
nuoK
(associated in ST-21)
- involved in NADH activity
+ switching from anaerobic to O2 enviro
mutants grow better in enhanced O2
Staphylococcus epidermidis
Coagulase-negative staphylococci
= commensal on human skin
major cause of nosocomial infection associated with surgery
Evolutionary models of infection
Pathogenic clones
- only pathogenic sub-population can cause disease
True opportunistic pathogenicity
- all strains can cause disease if they get into blood
(same virulence elements in both strains)
Divided genome
- any clone can cause disease (pathogenicity elements can move between strains)
Evolutionary models of infection
- associated variation in genomic data
Pathogenic clones
- all clones related
True opportunistic pathogenicity
+ divided genome
- identical evolutionary tree
GWAS method 1 = matching isolate pairs
- findings
61 genes
containing infection-associated genetic elements that correlate with in vitro variation in known pathogenicity traits
e.g. biofilm formation
pan genome
set of all strains of a species
GWAS method 2 = correlating with in vitro phenotype
- what is it?
Infection associated SNP prevalence correlated w/ pathogenicity phenotypes