W3 - Using The Result Of Genetic Studies Flashcards
What are the two large data analysis techniques?
GWAS
– Identify loci in the genome associated with a phenotype/disease - identifies millions of SNPs then use stats for identification of disease to pinpoint loci.
RNAseq
– Take a sample tissue and sequence all the RNA to provide a snapshot of gene expression levels
What are the difficulties of using GWAS data?
Large GWAS for complex diseases detect many loci
– Prioritisation - so thinking which area we need to focus on.
90% of GWAS SNPs are in non-coding regions of the genome
– Causal genes?
What is the mechanism of action explaining the association?
– Tissue/cell type?
– Molecular mechanism?
What is Linkage Disequilibrium?
Where alleles (DNA markers) occur together more often than can be accounted for by chance because of their physical proximity on a chromosome.
This is to do with the crossing over and recombination. The further away the loci, the more chance of cross-overs and if close together, more chance of being inherited togethe. If it were in linkage equilibrium, there would be a 50% change of two loci being inherited at the same time.
What is RNAseq data?
Looking to see which genes have more or fewer mRNA copies present.
Relative expression data for every gene
Need to set significance threshold
– P-value
– Fold change
Novel, allele specific expression, and
alternative transcripts may be identified
What is the osteocyte transcriptome?
Same cell type at different locations show different transcriptomes
>100 novel transcripts identified
Distinct pattern of different expressed genes in different tissues/cell types
What are the other applications of RNAseq?
Cell populations response to treatments
How gene expression changes through development or under disease conditions
Single cell transcriptome analysis
What are the difficulties of using RNAseq data?
Many expression changes likely to be found
– Difficult to differentiate real from methodological artifacts
Transcriptome is a snapshot of expression in a specific cell/tissue and at a specific time
Identification of differential expression does not provide biological reasoning
How useful are GWAS and RNA sequencing in when it comes to complex diseases and large data sets?
GWAS
Identifies associations across
whole genome
Large number of loci
Doesn’t identify causal variants or
genes
Doesn’t identify cell
type/tissue/developmental stage
RNA Sequencing
Transcriptome of single cell/tissue
type
Large number of differentially
expressed genes
Misses changes in other cell types
or stages of development
Doesn’t identify reason for
differential gene expression
What is a pathway analysis?
You can take a list from a large data set and see which genes are enriched and related.
What is personalised medicine?
Applying the results of genetic studies to the healthcare management of an individual
– Predict and prevent disease
– Diagnosis
– Personalised interventions
What is GWAS loci relating to biological reasons?
Links loci to disease traits.
-Causal mutation/gene for each loci
-Genes or pathways identifying with disease
-Prioritising what to investigate further
- Validating findings.
What are difficulties linking loci to gene?
Linkage Disequilibrium makes it difficult to distinguish causal variant
90% of GWAS SNPs are in non-coding regions
– Regulatory elements
* Promoters, Enhancers, TF binding sites
May act at a distance from effected gene(s)
Need to determine relevant cells/tissues
What is fine mapping?
High resolution study of loci attempting to pinpoint individual variants directly effecting trait
Statistical and probabalistic methods
or comparison to a SNP correlation reference panel
How are causal genes assigned?
Proximity
Non-synonymous exonic change
Chromatin conformation capture
What are the relevant cell types for fine mapping?
Often unclear what are the causal cells
SNP enrichment analysis
– Gene expression
– Regulatory elements
– Open chromatin - likely to indicate genes actively expressed. So you can assign each gene a regulatory activity score.