Data analysis Flashcards
What does data analysis consist of?
Turning raw image data into an extremely useful sequence data
What has happened following the development of bioinformatic tools?
The importance of data analysis has increased
What is bioinformatics?
Broad, interdisciplinary field that integrates principles from computer science, mathematics and statistics
In order to manage, mine, visualize and analyze biological data
Which field has bioinformatics co-evolved with?
Genomics
What is the role of a bioinformatician?
Develop analytical methods
Construct and curate computational tools and databases
Data mining, interpretation and analysis
What are the aims of a bioinformatician?
Identify differentially expressed genes
Identify epigenetic changes
Analyse pathways
Examples of ways genes can be differentially expressed
Somatic muations
Copy number alterations
What type of bioinformatician is the best bioinformatician?
Relevant background knowledge regarding the biological component of the data
Can differentiate between technical and biologically relevant artifacts
How can we increase genomic understanding of disease?
Pairing information obtained through genomic technologies with clinical data
This entails integrating omic and expression data
What are basic genome browsers?
Curated databases
Allow annotation of human and other species DNA
These are references or working drafts
What are the two main hosts of genome browsers?
USCS
ENSEMBL
Where is USCS based?
University of California
Where is ENSEMBL based?
Europe - UK and Germany
What do basic genome browsers bring together?
Genomic annotation from multiple species as well as many other data like transcription factor binding sites
What can we measure using microarrays?
DNA
Gene expression
What aspects of DNA can we measure using microarrays?
SNPs
Copy number variations
Methylation
Chromosome conformation
What aspects of gene expression can be measure using microarrays?
mRNA
miRNA
inRNA
What type of variation can be detected through SNPs, somatic mutations and CNVs ?
Genetic variation
What type of variation can be detected through DNA methylation and chromatin analysis?
Epigenetic variation
What type of information can be detected through RNA expression and gene structure?
Expression variation
Describe the 6 steps of array processing
- Experimental design
- Image analysis
- Normalisation to clean the data
- More low level analysis (fold change, ANOVA and data filtering)
- Data mining
- Validation
Why is it important to normalise the data?
Cleaning the data allows us to compare data across arrays without altering the interpretation of changes in gene expression
Why does the data need to be normalised?
The intensity of fluorescent markers might be different from one batch to another
Technical variation can hide real data
Unavoidable systematic bias
What is the main reason we normalise data?
Because the experimental goal is to identify biological variation and expression changes between samples
What is the most appropriate test used to analyse data?
Pairwise analysis using t-tests or ANOVA is the most appropriate
What is the goal of data analysis?
Determining the fold up or down cutoffs to figure out what is truly significant
What is a common theme to measure differences in gene expression in arrays and NGS?:
Ranking genes according to the evidence of difference in gene expression
Score the differences using fold changes, t-statistics or a combination
What are ways to interpret the changes in signal intensity?
Heat maps
Volcano plots