Using the Results of Genetic Studies Flashcards
What function do GWAS and RNA seq have?
Large data genetic analysis techniques
GWAS:
- Identifies associations across whole genome
- Large number of loci so you need to prioritise
- Doesn’t identify causal variants or genes
- Doesn’t identify cell type/tissue/developmental stage
RNA Sequencing
- Transcriptome of single cell/tissue type
- Large number of differentially expressed genes
- Misses changes in other cell types or stages of development
- Doesn’t identify reason for differential gene expression
What are the difficulties with using GWAS data?
Large GWAS for complex diseases detect many loci
- You have an issue of prioritisation
90% of GWAS SNPs are in non-coding regions of the genome
- Causal variant? Causal genes? Could be genetic linkage with an actual causal gene
What is the mechanism of action explaining the association?
- Tissue/cell type?
- Molecular mechanism?
What does RNAseq data look like?
Relative expression data for every gene
To make this comparison you need to set significance threshold:
- P-value- the lower the better
- Fold change- the degree by which the expression of a gene has changed
Novel, allele specific expression, and alternative transcripts may be identified
What are the difficulties of using RNAseq data?
Many expression changes likely to be found
Difficult to differentiate real from methodological artifacts
Transcriptome is a snapshot of expression in a specific cell/tissue and at a specific time
Identification of differential expression does not provide biological reasoning
What is pathway analysis?
What types of genes are differentially expressed/implicated?
Generate a gene set, and compare to database
Gene ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (Kegg)
- Must have been previously annotated
Allows you to identify new biology by determining the type of genes with association/differential expression
Why is it so difficult to link loci to a gene?
Linkage Disequilibrium makes it difficult to distinguish causal variant
90% of GWAS SNPs are in non-coding regions
- More likely to be in a regulatory elements such as promoters, Enhancers, TF binding sites
May act at a distance from effected gene(s)
Need to determine relevant cells/tissues
What are the ways you can assign causal genes?
Proximity- the closest gene to any fine mapped causal SNP
If the gene body overlapped with the causal SNP
If the SNP directly caused coding change in a gene
Any gene related to a causal SNP that fell into an ATAC-seq peak in the human osteoblast cell lines
These are regions in the cell lines where the chromatin has been shown to be open and accessible so likely to be actively expressed
Non-synonymous exonic change
Chromatin conformation capture
Lastly we perform something called a high seq chromatin conformation capture study in our human osteoblast
This gives you a 3D map of the chromatin conformation in those cells so you can identify any genes that might be in contact at a distance with that SNP location
How could you find the relevant cell type?
Often unclear what are the causal cells
But by combining GWAS with functional genomic data from individual cell types or tissues (like the ATAC or Hi-C studies mentioned previously)
Then you can assign all regions within a cell type specific regulatory activity scores based on how likely those genomic regions to be active in that cell type or tissue
These locations can then be compared to the GWAS loci and you can determine if a higher than expected number of GWAS significant SNPs fall into an active regions for that cell type
SNP enrichment analysis
- Gene expression
- Regulatory elements
- Open chromatin
Can we use gene expression data to help annotate our GWAS loci?
eQTL - A locus that explains a fraction of the genetic variance of a gene expression phenotype
Colocalisation analysis – Compare the GWAS and eQTL at a locus to determine if they are due to the same causal variants
TWAS -Transcriptome wide association studies
What is co-localisation analysis?
You can combine eQTLs and GWAS analysis through colocalization analysis
To do this you identify your eQTLs and your GWS loci that have an overlapping position in the genome
You then compare the results of GWAS fine mapping and eQTL analysis at the locus to determine whether they are caused by the same or separate signals
If the peaks overlap they are colocalizing signals if they do not present the same pattern they are non-colocalizing signals
What are the explanations for locus overlap?
GWAS and eQTL loci can overlap for 3 reasons:
Independent causal variants in LD
A single causal SNP- you have identified the mechanism explaining your GWAS loci
Pleiotropy- a SNP could have a pleiotropic effect; influencing a specific gene in a specific cell type explaining the eQTL but at the same time having a different effect on a different gene or tissue leading to the GWAS phenotypic trait
What is a transcriptome-wide association analysis TWAS?
This is the idea that you can directly combine the eQTL and GWAS studies instead of overlapping separate maps; combining GWAS and RNA-seq experiments directly
You take the thousands of subjects required for a GWAS and instead of comparing SNP genotypes and looking for association with the phenotypic trait it sees, you instead obtain samples of all the possibly relevant cell types from each subject and then perform RNA-seq analysis to generate gene expression profiles for each cell type from each subject
You can then directly look for associations between gene expression levels and the overall phenotypes
Once you have prioritised genes fro investigations what do you do?
- Cell Studies
- Functional Phenotyping
- High throughput screens
What do cell studies look to find?
What cell type?
Here we looked at osteoblasts because that’s where we found it was expressed
Is it expressed?
Where is it located in the cell?
Immunofluorescent experiments found the location
Does it effect the functional activity of the cell?
What do we use model animals for in functional analysis?
Knockout animals in several ways: - Total knockout - Cell specific - Inducible - Gene editing Before you determine that you must ask: What is the functional phenotype of interest? What is the appropriate model?